Search results for: speech recognition systems - Bridge of Knowledge

Search

Search results for: speech recognition systems

Search results for: speech recognition systems

  • Improving the quality of speech in the conditions of noise and interference

    Publication

    The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

    Full text to download in external service

  • Feature extraction in detection and recognition of graphical objects

    Publication

    - Year 2022

    Detection and recognition of graphic objects in images are of great and growing importance in many areas, such as medical and industrial diagnostics, control systems in automation and robotics, or various types of security systems, including biometric security systems related to the recognition of the face or iris of the eye. In addition, there are all systems that facilitate the personal life of the blind people, visually impaired...

  • Applying the Lombard Effect to Speech-in-Noise Communication

    Publication

    - Electronics - Year 2023

    This study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...

    Full text available to download

  • Constructing a Dataset of Speech Recordingswith Lombard Effect

    Publication

    - Year 2020

    Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

  • Virtual keyboard controlled by eye gaze employing speech synthesis

    Publication

    - Year 2010

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

  • Virtual Keyboard controlled by eye gaze employing speech synthesis

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

    Full text to download in external service

  • Improved method for real-time speech stretching

    Publication

    n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

    Full text to download in external service

  • Building Knowledge for the Purpose of Lip Speech Identification

    Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

    Full text to download in external service

  • Real-time speech-rate modification experiments

    Publication

    An algorithm designed for real-time speech time scale modification (stretching) is proposed, providing a combination of typical synchronous overlap and add based time scale modification algorithm and signal redundancy detection algorithms that allow to remove parts of the speech signal and replace them with the stretched speech signal fragments. Effectiveness of signal processing algorithms are examined experimentally together...

    Full text to download in external service

  • Koncepcja analizy stanów emocjonalnych użytkowników w kontekście systemów zabezpieczeń transportowych

    Autorzy, przywołując własne i światowe badania nad rozpoznawaniem emocji ludzkich z obrazu twarzy, wskazują na możliwość zastosowania algorytmów komputerowych i ich implementacji w komputerach osobistych (i innych urządzeniach personalnych wyposażonych w dostatecznie silny procesor obliczeniowy). Zastosowanie takiego rozwiązania może poprawić bezpieczeństwo użytkowania urządzeń, maszyn i pojazdów, których operatorzy muszą gwarantować...

    Full text available to download

  • Improving Objective Speech Quality Indicators in Noise Conditions

    Publication

    - Year 2020

    This work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...

    Full text to download in external service

  • Emotion Recognition for Affect Aware Video Games

    In this paper the idea of affect aware video games is presented. A brief review of automatic multimodal affect recognition of facial expressions and emotions is given. The first result of emotions recognition using depth data as well as prototype affect aware video game are presented

    Full text to download in external service

  • Emotion Recognition and Its Applications

    The paper proposes a set of research scenarios to be applied in four domains: software engineering, website customization, education and gaming. The goal of applying the scenarios is to assess the possibility of using emotion recognition methods in these areas. It also points out the problems of defining sets of emotions to be recognized in different applications, representing the defined emotional states, gathering the data and...

    Full text to download in external service

  • Automatic sound recognition for security purposes

    Publication

    - Year 2008

    In the paper an automatic sound recognition system is presented. It forms a part of a bigger security system developed in order to monitor outdoor places for non-typical audio-visual events. The analyzed audio signal is being recorded from a microphone mounted in an outdoor place thus a non stationary noise of a significant energy is present in it. In the paper an especially designed algorithm for outdoor noise reduction is presented,...

  • Speech synthesis controlled by eye gazing

    Publication

    - Year 2010

    A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

  • Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

    Publication

    The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

    Full text available to download

  • Time-domain prosodic modifications for text-to-speech synthesizer

    Publication

    - Year 2010

    An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

  • A Method of Real-Time Non-uniform Speech Stretching

    Publication

    Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

    Full text to download in external service

  • Distortion of speech signals in the listening area: its mechanism and measurements

    Publication

    - Year 2014

    The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

    Full text to download in external service

  • Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

    Publication
    • D. Korzekwa
    • R. Barra-Chicote
    • B. Kostek
    • T. Drugman
    • M. Łajszczak

    - Year 2019

    We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

    Full text available to download

  • Uncertainty in emotion recognition

    Purpose–The purpose of this paper is to explore uncertainty inherent in emotion recognition technologiesand the consequences resulting from that phenomenon.Design/methodology/approach–The paper is a general overview of the concept; however, it is basedon a meta-analysis of multiple experimental and observational studies performed over the past couple of years.Findings–The mainfinding of the paper might be summarized as follows:...

    Full text to download in external service

  • Comparison of various speech time-scale modificartion methods

    The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

  • Speech codec enhancements utilizing time compression and perceptual coding

    Publication

    A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

  • Multiclass AdaBoost Classifier Parameter Adaptation for Pattern Recognition

    The article presents the problem of parameter value selection of the multiclass ``one against all'' approach of an AdaBoost algorithm in tasks of object recognition based on two-dimensional graphical images. AdaBoost classifier with Haar features is still used in mobile devices due to the processing speed in contrast to other methods like deep learning or SVM but its main drawback is the need to assembly the results of binary...

    Full text to download in external service

  • Tensor Decomposition for Imagined Speech Discrimination in EEG

    Publication

    - LECTURE NOTES IN COMPUTER SCIENCE - Year 2018

    Most of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...

    Full text to download in external service

  • Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

    Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

    Full text available to download

  • Recognition and sensing of anions

    Publication

    Molecular ion recognition is one of the most intensively studied areas of supramolecular technology. The reason for this is the essential role that ions play in many biological as well as industrial processes. On the other hand, however, it has been proved that ions can have a negative impact on human health and the environment. For these reasons, it is extremly important to develop rapid and simple methods allowing the determination...

  • Rough Sets Applied to Mood of Music Recognition

    Publication

    - Year 2016

    With the growth of accessible digital music libraries over the past decade, there is a need for research into automated systems for searching, organizing and recommending music. Mood of music is considered as one of the most intuitive criteria for listeners, thus this work is focused on the emotional content of music and its automatic recognition. The research study presented in this work contains an attempt to music emotion recognition...

  • Integration in Multichannel Emotion Recognition

    Publication

    - Year 2018

    The paper concerns integration of results provided by automatic emotion recognition algorithms. It presents both the challenges and the approaches to solve them. Paper shows experimental results of integration. The paper might be of interest to researchers and practitioners who deal with automatic emotion recognition and use more than one solution or multichannel observation.

    Full text available to download

  • Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

    Publication

    - Year 2022

    The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

    Full text available to download

  • Mining inconsistent emotion recognition results with the multidimensional model

    Publication

    - IEEE Access - Year 2021

    The paper deals with the challenge of inconsistency in multichannel emotion recognition. The focus of the paper is to explore factors that might influence the inconsistency. The paper reports an experiment that used multi-camera facial expression analysis with multiple recognition systems. The data were analyzed using a multidimensional approach and data mining techniques. The study allowed us to explore camera location, occlusions...

    Full text available to download

  • Elimination of clicks from archive speech signals using sparse autoregressive modeling

    Publication

    This paper presents a new approach to elimination of impulsivedisturbances from archive speech signals. The proposedsparse autoregressive (SAR) signal representation is given ina factorized form - the model is a cascade of the so-called formantfilter and pitch filter. Such a technique has been widelyused in code-excited linear prediction (CELP) systems, as itguarantees model stability. After detection of noise pulses usinglinear...

    Full text to download in external service

  • Ranking Speech Features for Their Usage in Singing Emotion Classification

    Publication

    - Year 2020

    This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

    Full text available to download

  • Limitations of Emotion Recognition in Software User Experience Evaluation Context

    This paper concerns how an affective-behavioural- cognitive approach applies to the evaluation of the software user experience. Although it may seem that affect recognition solutions are accurate in determining the user experience, there are several challenges in practice. This paper aims to explore the limitations of the automatic affect recognition applied in the usability context as well as...

    Full text available to download

  • Human emotion recognition with biosignals

    Publication

    - Year 2022

    This chapter presents issues in the field of affective computing. Basic preliminary information for the recognition of emotions is given and models of emotions, various ways of evoking emotions, as well as their theoretical foundations are discussed. The particular attention is given to the use of physiological signals in recognizing emotions. This subject is outlined further below by presenting selected biosignals, their relationship...

    Full text to download in external service

  • System Supporting Speech Perception in Special Educational Needs Schoolchildren

    Publication

    - Year 2012

    The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

    Full text to download in external service

  • High quality speech codec employing sines+noise+transients model

    A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

    Full text available to download

  • Silence/noise detection for speech and music signals

    Publication

    - Year 2008

    This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

  • Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

    Publication

    - Year 2018

    The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

  • A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times

    Publication

    Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...

    Full text available to download

  • Database of speech and facial expressions recorded with optimized face motion capture settings

    The broad objective of the present research is the analysis of spoken English employing a multiplicity of modalities. An important stage of this process, discussed in the paper, is creating a database of speech accompanied with facial expressions. Recordings of speakers were made using an advanced system for capturing facial muscle motion. A brief historical outline, current applications, limitations and the ways of capturing face...

    Full text available to download

  • Corrupted speech intelligibility improvement using adaptive filter based algorithm

    Publication

    A technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.

  • Jarosław Sadowski dr hab. inż.

  • A non-uniform real-time speech time-scale stretching method

    Publication

    An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...

  • Hand gesture recognition supported by fuzzy rules and Kalman filters

    The paper presents a system based on camera and multimediaprojector enabling a user to control computer applications by dynamic hand gestures. Gesture recognition methodology based on representing hand movement trajectory by motion vectors analysed using fuzzy rule-based inference is first given. For effective hand position tracking Kalman filters are employed. The system engineered is developed using J2SE and C++/OpenCV technology....

  • Recognition of Hand Drawn Flowcharts

    Publication

    - Year 2013

    In this paper the problem of hand drawn flowcharts recognition is presented. There are described two attitudes to this problem: on-line and off-line. A concept of FCE, a system for recognizing and understanding of freehand drawn on-line flow charts on desktop computer and mobile devices is presented. The first experiments with the FCE system and the planes for future are also described.

  • Intelligent processing of stuttered speech.

    W artykule zaprezentowano kilka metod analizy i automatycznego zliczania potknięć artykulacyjnych, związanych z jąkaniem się, opartych na wykorzystaniu algorytmów uczących się sztucznych sieci neuronowych i zbiorów przybliżonych.

  • Using Physiological Signals for Emotion Recognition

    Publication

    - Year 2013

    Recognizing user’s emotions is the promising area of research in a field of human-computer interaction. It is possible to recognize emotions using facial expression, audio signals, body poses, gestures etc. but physiological signals are very useful in this field because they are spontaneous and not controllable. In this paper a problem of using physiological signals for emotion recognition is presented. The kinds of physiological...

    Full text to download in external service

  • Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System

    Publication

    - Year 2021

    Although there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...

    Full text to download in external service

  • Communication Platform for Evaluation of Transmitted Speech Quality

    A voice communication system designed and implemented is described. The purpose of the presented platform was to enable a series of experiments related to the quality assessment of algorithms used in the coding and transmitting of speech. The system is equipped with tools for recording signals at each stage of processing, making it possible to subject them to subjective assessments by listening tests or, objective evaluation employing...

    Full text available to download