Search results for: DYSARTHRIA DETECTION, SPEECH RECOGNITION, SPEECH SYNTHESIS, INTERPRETABLE DEEP LEARNING MODELS - Bridge of Knowledge

Search

Search results for: DYSARTHRIA DETECTION, SPEECH RECOGNITION, SPEECH SYNTHESIS, INTERPRETABLE DEEP LEARNING MODELS

Search results for: DYSARTHRIA DETECTION, SPEECH RECOGNITION, SPEECH SYNTHESIS, INTERPRETABLE DEEP LEARNING MODELS

  • Playback detection using machine learning with spectrogram features approach

    Publication

    - Year 2017

    This paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on different data partitions in respect...

    Full text available to download

  • Toward Robust Pedestrian Detection With Data Augmentation

    Publication

    In this article, the problem of creating a safe pedestrian detection model that can operate in the real world is tackled. While recent advances have led to significantly improved detection accuracy on various benchmarks, existing deep learning models are vulnerable to invisible to the human eye changes in the input image which raises concerns about its safety. A popular and simple technique for improving robustness is using data...

    Full text available to download

  • A comparative study of English viseme recognition methods and algorithm

    An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...

    Full text available to download

  • A comparative study of English viseme recognition methods and algorithms

    An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...

    Full text available to download

  • Creating new voices using normalizing flows

    Publication
    • P. Biliński
    • T. Merritt
    • A. Ezzerg
    • K. Pokora
    • S. Cygert
    • K. Yanagisawa
    • R. Barra-Chicote
    • D. Korzekwa

    - Year 2022

    Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

    Full text available to download

  • The Innovative Faculty for Innovative Technologies

    A leaflet describing Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology. Multimedia Systems Department described laboratories and prototypes of: Auditory-visual attention stimulator, Automatic video event detection, Object re-identification application for multi-camera surveillance systems, Object Tracking and Automatic Master-Slave PTZ Camera Positioning System, Passive Acoustic Radar,...

    Full text to download in external service

  • Ranking Speech Features for Their Usage in Singing Emotion Classification

    Publication

    - Year 2020

    This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

    Full text available to download

  • PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS

    Publication

    - Year 2015

    The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...

  • INVESTIGATION OF THE LOMBARD EFFECT BASED ON A MACHINE LEARNING APPROACH

    Publication

    The Lombard effect is an involuntary increase in the speaker’s pitch, intensity, and duration in the presence of noise. It makes it possible to communicate in noisy environments more effectively. This study aims to investigate an efficient method for detecting the Lombard effect in uttered speech. The influence of interfering noise, room type, and the gender of the person on the detection process is examined. First, acoustic parameters...

    Full text available to download

  • Geometric Algebra Model of Distributed Representations

    Publication

    - Year 2010

    Formalism based on GA is an alternative to distributed representation models developed so far-Smolensky's tensor product, Holographic Reduced Representations (HRR) and Binary Spatter Code (BSC). Convolutions are replaced by geometric products, interpretable in terms of geometry which seems to be the most natural language for visualization of higher concepts. This paper recalls the main ideas behind the GA model and investigates...

  • Selected Technical Issues of Deep Neural Networks for Image Classification Purposes

    In recent years, deep learning and especially Deep Neural Networks (DNN) have obtained amazing performance on a variety of problems, in particular in classification or pattern recognition. Among many kinds of DNNs, the Convolutional Neural Networks (CNN) are most commonly used. However, due to their complexity, there are many problems related but not limited to optimizing network parameters, avoiding overfitting and ensuring good...

    Full text available to download

  • Distortion of speech signals in the listening area: its mechanism and measurements

    Publication

    - Year 2014

    The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

    Full text to download in external service

  • Elimination of clicks from archive speech signals using sparse autoregressive modeling

    Publication

    This paper presents a new approach to elimination of impulsivedisturbances from archive speech signals. The proposedsparse autoregressive (SAR) signal representation is given ina factorized form - the model is a cascade of the so-called formantfilter and pitch filter. Such a technique has been widelyused in code-excited linear prediction (CELP) systems, as itguarantees model stability. After detection of noise pulses usinglinear...

    Full text to download in external service

  • Thermal Images Analysis Methods using Deep Learning Techniques for the Needs of Remote Medical Diagnostics

    Publication

    - Year 2020

    Remote medical diagnostic solutions have recently gained more importance due to global demographic shifts and play a key role in evaluation of health status during epidemic. Contactless estimation of vital signs with image processing techniques is especially important since it allows for obtaining health status without the use of additional sensors. Thermography enables us to reveal additional details, imperceptible in images acquired...

    Full text available to download

  • Vocalic Segments Classification Assisted by Mouth Motion Capture

    Visual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested...

    Full text to download in external service

  • Detection of Alzheimer's disease using Otsu thresholding with tunicate swarm algorithm and deep belief network

    Publication

    - Frontiers in Physiology - Year 2024

    Introduction: Alzheimer’s Disease (AD) is a degenerative brain disorder characterized by cognitive and memory dysfunctions. The early detection of AD is necessary to reduce the mortality rate through slowing down its progression. The prevention and detection of AD is the emerging research topic for many researchers. The structural Magnetic Resonance Imaging (sMRI) is an extensively used imaging technique in detection of AD, because...

    Full text available to download

  • Deep CNN based decision support system for detection and assessing the stage of diabetic retinopathy

    Publication

    - Year 2018

    The diabetic retinopathy is a disease caused by long-standing diabetes. Lack of effective treatment can lead to vision impairment and even irreversible blindness. The disease can be diagnosed by examining digital color fundus photographs of retina. In this paper we propose deep learning approach to automated diabetic retinopathy screening. Deep convolutional neural networks (CNN) - the most popular kind of deep learning algorithms...

    Full text to download in external service

  • Deep Learning-Based, Multiclass Approach to Cancer Classification on Liquid Biopsy Data

    Publication

    - IEEE Journal of Translational Engineering in Health and Medicine-JTEHM - Year 2024

    The field of cancer diagnostics has been revolutionized by liquid biopsies, which offer a bridge between laboratory research and clinical settings. These tests are less invasive than traditional biopsies and more convenient than routine imaging methods. Liquid biopsies allow studying of tumor-derived markers in bodily fluids, enabling the development of more precise cancer diagnostic tests for screening, disease monitoring, and...

    Full text available to download

  • Voice command recognition using hybrid genetic algorithm

    Publication

    Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...

    Full text available to download

  • Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

    A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

    Full text to download in external service

  • Deep Learning

    Publication

    - Year 2021

    Deep learning (DL) is a rising star of machine learning (ML) and artificial intelligence (AI) domains. Until 2006, many researchers had attempted to build deep neural networks (DNN), but most of them failed. In 2006, it was proven that deep neural networks are one of the most crucial inventions for the 21st century. Nowadays, DNN are being used as a key technology for many different domains: self-driven vehicles, smart cities,...

    Full text to download in external service

  • From Linear Classifier to Convolutional Neural Network for Hand Pose Recognition

    Publication

    Recently gathered image datasets and the new capabilities of high-performance computing systems have allowed developing new artificial neural network models and training algorithms. Using the new machine learning models, computer vision tasks can be accomplished based on the raw values of image pixels instead of specific features. The principle of operation of deep neural networks resembles more and more what we believe to be happening...

    Full text available to download

  • Data Domain Adaptation in Federated Learning in the Breast Mammography Image Classification Problem

    We are increasingly striving to introduce modern artificial intelligence techniques in medicine and elevate medical care, catering to both patients and specialists. An essential aspect that warrants concurrent development is the protection of personal data, especially with technology's advancement, along with addressing data disparities to ensure model efficacy. This study assesses various domain adaptation techniques and federated...

    Full text to download in external service

  • AITP - AI Thermal Pedestrians Dataset

    Efficient pedestrian detection is a very important task in ensuring safety within road conditions, especially after sunset. One way to achieve this goal is to use thermal imaging in conjunction with deep learning methods and an annotated dataset for models training. In this work, such a dataset has been created by capturing thermal images of pedestrians in different weather and traffic conditions. All images were manually annotated...

    Full text to download in external service

  • DevEmo—Software Developers’ Facial Expression Dataset

    The COVID-19 pandemic has increased the relevance of remote activities and digital tools for education, work, and other aspects of daily life. This reality has highlighted the need for emotion recognition technology to better understand the emotions of computer users and provide support in remote environments. Emotion recognition can play a critical role in improving the remote experience and ensuring that individuals are able...

    Full text available to download

  • Selection of Features for Multimodal Vocalic Segments Classification

    Publication

    English speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the influence of feature vector dimension reduction for the accuracy of vocalic segments classification employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...

    Full text to download in external service

  • Decoding imagined speech for EEG-based BCI

    Publication
    • C. A. Reyes-García
    • A. A. Torres-García
    • T. Hernández-del-Toro
    • J. S. Garcia Salinas
    • L. Villaseñor-Pineda

    - Year 2024

    Brain–computer interfaces (BCIs) are systems that transform the brain's electrical activity into commands to control a device. To create a BCI, it is necessary to establish the relationship between a certain stimulus, internal or external, and the brain activity it provokes. A common approach in BCIs is motor imagery, which involves imagining limb movement. Unfortunately, this approach allows few commands. As an alternative, this...

    Full text to download in external service

  • Speaker Recognition Using Convolutional Neural Network with Minimal Training Data for Smart Home Solutions

    Publication

    - Year 2018

    With the technology advancements in smart home sector, voice control and automation are key components that can make a real difference in people's lives. The voice recognition technology market continues to involve rapidly as almost all smart home devices are providing speaker recognition capability today. However, most of them provide cloud-based solutions or use very deep Neural Networks for speaker recognition task, which are...

    Full text to download in external service

  • Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech

    Publication
    • D. Piotrowski
    • R. Korzeniowski
    • A. Falai
    • S. Cygert
    • K. Pokora
    • G. Tinchev
    • Z. Zhang
    • K. Yanagisawa

    - Year 2023

    In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

    Full text to download in external service

  • Impact of Visual Image Quality on Lymphocyte Detection Using YOLOv5 and RetinaNet Algorithms

    Lymphocytes, a type of leukocytes, play a vital role in the immune system. The precise quantification, spatial arrangement and phenotypic characterization of lymphocytes within haematological or histopathological images can serve as a diagnostic indicator of a particular lesion. Artificial neural networks, employed for the detection of lymphocytes, not only can provide support to the work of histopathologists but also enable better...

    Full text to download in external service

  • Automatic Emotion Recognition in Children with Autism: A Systematic Literature Review

    Publication

    - SENSORS - Year 2022

    The automatic emotion recognition domain brings new methods and technologies that might be used to enhance therapy of children with autism. The paper aims at the exploration of methods and tools used to recognize emotions in children. It presents a literature review study that was performed using a systematic approach and PRISMA methodology for reporting quantitative and qualitative results. Diverse observation channels and modalities...

    Full text available to download

  • Deep neural networks for data analysis 24/25

    e-Learning Courses
    • J. Cychnerski
    • K. Draszawka

    This course covers introduction to supervised machine learning, construction of basic artificial deep neural networks (DNNs) and basic training algorithms, as well as the overview of popular DNNs architectures (convolutional networks, recurrent networks, transformers). The course introduces students to popular regularization techniques for deep models. Besides theory, large part of the course is the project in which students apply...

  • Examining Feature Vector for Phoneme Recognition

    Publication

    - Year 2018

    The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

  • Fully Automated AI-powered Contactless Cough Detection based on Pixel Value Dynamics Occurring within Facial Regions

    Publication

    - Year 2021

    Increased interest in non-contact evaluation of the health state has led to higher expectations for delivering automated and reliable solutions that can be conveniently used during daily activities. Although some solutions for cough detection exist, they suffer from a series of limitations. Some of them rely on gesture or body pose recognition, which might not be possible in cases of occlusions, closer camera distances or impediments...

    Full text to download in external service

  • Face with Mask Detection in Thermal Images Using Deep Neural Networks

    Publication

    As the interest in facial detection grows, especially during a pandemic, solutions are sought that will be effective and bring more benefits. This is the case with the use of thermal imaging, which is resistant to environmental factors and makes it possible, for example, to determine the temperature based on the detected face, which brings new perspectives and opportunities to use such an approach for health control purposes. The...

    Full text available to download

  • Mobilenet-V2 Enhanced Parkinson's Disease Prediction with Hybrid Data Integration

    Publication

    - Year 2024

    This study investigates the role of deep learning models, particularly MobileNet-v2, in Parkinson's Disease (PD) detection through handwriting spiral analysis. Handwriting difficulties often signal early signs of PD, necessitating early detection tools due to potential impacts on patients' work capacities. The study utilizes a three-fold approach, including data augmentation, algorithm development for simulated PD image datasets,...

    Full text to download in external service

  • Abdalraheem Ijjeh Ph.D. Eng.

    People

    The primary research areas of interest are artificial intelligence (AI), machine learning, deep learning, and computer vision, as well as modeling physical phenomena (i.e., guided waves in composite laminates). The research interests described above are utilized for SHM and NDE applications, namely damage detection and localization in composite materials.  

  • Breast MRI segmentation by deep learning: key gaps and challenges

    Publication

    Breast MRI segmentation plays a vital role in early diagnosis and treatment planning of breast anomalies. Convolutional neural networks with deep learning have indicated promise in automating this process, but significant gaps and challenges remain to address. This PubMed-based review provides a comprehensive literature overview of the latest deep learning models used for breast segmentation. The article categorizes the literature...

    Full text available to download

  • Detecting Objects of Various Categories in Optical Remote Sensing Imagery Using Neural Networks

    Publication

    - Year 2024

    The effective detection of objects in remote sensing images is of great research importance, so recent years have seen a significant progress in deep learning techniques in this field. However, despite much valuable research being conducted, many challenges still remain. A lot of research projects focus on detecting objects of a single category (class), while correctly detecting objects of different categories is much harder. The...

    Full text to download in external service

  • Assessing the attractiveness of human face based on machine learning

    Publication

    The attractiveness of the face plays an important role in everyday life, especially in the modern world where social media and the Internet surround us. In this study, an attempt to assess the attractiveness of a face by machine learning is shown. Attractiveness is determined by three deep models whose sum of predictions is the final score. Two annotated datasets available in the literature are employed for training and testing...

    Full text available to download

  • Examining Feature Vector for Phoneme Recognition / Analiza parametrów w kontekście automatycznej klasyfikacji fonemów

    Publication

    - Year 2017

    The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

  • DEVELOPMENT OF THE ALGORITHM OF POLISH LANGUAGE FILM REVIEWS PREPROCESSING

    The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in the Vector Form (Vector Space Model) Procedure; Forming...

    Full text available to download

  • Affective Learning Manifesto – 10 Years Later

    Publication

    - Year 2014

    In 2004 a group of affective computing researchers proclaimed a manifesto of affective learning that outlined the prospects and white spots of research at that time. Ten years passed by and affective computing developed many methods and tools for tracking human emotional states as well as models for affective systems construction. There are multiple examples of affective methods applications in Intelligent Tutoring Systems (ITS)....

  • Sensing Direction of Human Motion Using Single-Input-Single-Output (SISO) Channel Model and Neural Networks

    Publication

    - IEEE Access - Year 2022

    Object detection Through-the-Walls enables localization and identification of hidden objects behind the walls. While numerous studies have exploited Channel State Information of Multiple Input Multiple Output (MIMO) WiFi and radar devices in association with Artificial Intelligence based algorithms (AI) to detect and localize objects behind walls, this study proposes a novel non-invasive Through-the-Walls human motion direction...

    Full text available to download

  • A low complexity double-talk detector based on the signal envelope

    A new algorithm for double-talk detection, intended for use in the acoustic echo canceller for voice communication applications, is proposed. The communication system developed by the authors required the use of a double-talk detection algorithm with low complexity and good accuracy. The authors propose an approach to doubletalk detection based on the signal envelopes. For each of three signals: the far-end speech, the microphone...

    Full text available to download

  • English Language Learning Employing Developments in Multimedia IS

    Publication

    In the realm of the development of information systems related to education, integrating multimedia technologies offers novel ways to enhance foreign language learning. This study investigates audio-video processing methods that leverage real-time speech rate adjustment and dynamic captioning to support English language acquisition. Through a mixed-methods analysis involving participants from a language school, we explore the impact...

    Full text to download in external service

  • Tool Wear Monitoring Using Improved Dragonfly Optimization Algorithm and Deep Belief Network

    Publication
    • L. Gertrude David
    • R. Kumar Patra
    • P. Falkowski-Gilski
    • P. Bidare Divakarachari
    • L. J. Antony Marcilin

    - Applied Sciences-Basel - Year 2022

    In recent decades, tool wear monitoring has played a crucial role in the improvement of industrial production quality and efficiency. In the machining process, it is important to predict both tool cost and life, and to reduce the equipment downtime. The conventional methods need enormous quantities of human resources and expert skills to achieve precise tool wear information. To automatically identify the tool wear types, deep...

    Full text available to download

  • Automated Parking Management for Urban Efficiency: A Comprehensive Approach

    Publication

    - Year 2024

    Effective parking management is essential for ad-dressing the challenges of traffic congestion, city logistics, and air pollution in densely populated urban areas. This paper presents an algorithm designed to optimize parking management within city environments. The proposed system leverages deep learning models to accurately detect and classify street elements and events. Various algorithms, including automatic segmentation of...

    Full text to download in external service

  • Federated Learning in Healthcare Industry: Mammography Case Study

    The paper focuses on the role of federated learning in a healthcare environment. The experimental setup involved different healthcare providers, each with their datasets. A comparison was made between training a deep learning model using traditional methods, where all the data is stored in one place, and using federated learning, where the data is distributed among the workers. The experiment aimed to identify possible challenges...

    Full text to download in external service

  • Analyzing the relationship between sound, color, and emotion based on subjective and machine-learning approaches

    The aim of the research is to analyze the relationship between sound, color, and emotion. For this purpose, a survey application was prepared, enabling the assignment of a color to a given speaker’s/singer’s voice recordings. Subjective tests were then conducted, enabling the respondents to assign colors to voice/singing samples. In addition, a database of voice/singing recordings of people speaking in a natural way and with expressed...

    Full text available to download