Wyniki wyszukiwania dla: VISUAL SPEECH RECOGNITION - MOST Wiedzy

Wyszukiwarka

Wyniki wyszukiwania dla: VISUAL SPEECH RECOGNITION

Filtry

wszystkich: 1419
wybranych: 991

wyczyść wszystkie filtry


Filtry wybranego katalogu

  • Kategoria

  • Rok

  • Opcje

wyczyść Filtry wybranego katalogu niedostępne

Wyniki wyszukiwania dla: VISUAL SPEECH RECOGNITION

  • Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

    Publikacja
    • D. Korzekwa
    • R. Barra-Chicote
    • B. Kostek
    • T. Drugman
    • M. Łajszczak

    - Rok 2019

    We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

    Pełny tekst do pobrania w portalu

  • Uncertainty in emotion recognition

    Purpose–The purpose of this paper is to explore uncertainty inherent in emotion recognition technologiesand the consequences resulting from that phenomenon.Design/methodology/approach–The paper is a general overview of the concept; however, it is basedon a meta-analysis of multiple experimental and observational studies performed over the past couple of years.Findings–The mainfinding of the paper might be summarized as follows:...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Comparison of various speech time-scale modificartion methods

    The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

  • Speech codec enhancements utilizing time compression and perceptual coding

    Publikacja

    A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

  • Tensor Decomposition for Imagined Speech Discrimination in EEG

    Publikacja

    - LECTURE NOTES IN COMPUTER SCIENCE - Rok 2018

    Most of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • New Applications of Multimodal Human-Computer Interfaces

    Publikacja

    - Rok 2012

    Multimodal computer interfaces and examples of their applications to education software and for the disabled people are presented. The proposed interfaces include the interactive electronic whiteboard based on video image analysis, application for controlling computers with gestures and the audio interface for speech stretching for hearing impaired and stuttering people. Application of the eye-gaze tracking system to awareness...

  • Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

    Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

    Pełny tekst do pobrania w portalu

  • Augmented Reality for Privacy-Sensitive Visual Monitoring

    Publikacja

    - Rok 2014

    The paper presents a method for video anonymization and replacing real human silhouettes with virtual 3D figures rendered on the screen. Video stream is processed to detect and to track objects, whereas anonymization stage employs fast blurring method. Substitute 3D figures are animated accordingly to behavior of detected persons. Their location, movement speed, direction, and person height are taken into account during the animation...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Recognition and sensing of anions

    Publikacja

    Molecular ion recognition is one of the most intensively studied areas of supramolecular technology. The reason for this is the essential role that ions play in many biological as well as industrial processes. On the other hand, however, it has been proved that ions can have a negative impact on human health and the environment. For these reasons, it is extremly important to develop rapid and simple methods allowing the determination...

  • Exploiting audio-visual correlation by means of gaze tracking

    This paper presents a novel means for increasing audio-visual correlation analysis reliability. This is done based on gaze tracking technology engineered at the Multimedia Systems Department of the Gdansk University of Technology, Poland. In the paper, the past history and current research in the area of audio-visual perception analysis are shortly reviewed. Then the methodology employing gaze tracking is presented along with the...

    Pełny tekst do pobrania w portalu

  • Special forms of echo visual representation in an ahead looking sonar.

    The paper discusses ways to organise visual representation in a multi-beam ahead looking sonars whose function is to detect objects on the bottom and in pelagic zones. Forms of visual representation are shown and illustrated on the basic screen (panoramic representation and setting, alarms) and on the auxiliary screen (type A, B and special). Special forms of visual representation are mainly used in detecting objects in difficult...

    Pełny tekst do pobrania w portalu

  • Integration in Multichannel Emotion Recognition

    Publikacja

    - Rok 2018

    The paper concerns integration of results provided by automatic emotion recognition algorithms. It presents both the challenges and the approaches to solve them. Paper shows experimental results of integration. The paper might be of interest to researchers and practitioners who deal with automatic emotion recognition and use more than one solution or multichannel observation.

    Pełny tekst do pobrania w portalu

  • Visual Management as the support in building the concept of continuous improvement in the enterprise

    The following article presents one of the selected tools of the Lean Management concept – visual management. This method enables enterprises to strengthen their process of continuous improvement. Due to the support of visual management, it is possible to manage information more effectively by the managerial board and to improve communication process within in the particular company. In the first part, the author describes the concept...

    Pełny tekst do pobrania w portalu

  • Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

    Publikacja

    - Rok 2022

    The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

    Pełny tekst do pobrania w portalu

  • Ranking Speech Features for Their Usage in Singing Emotion Classification

    Publikacja

    This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

    Pełny tekst do pobrania w portalu

  • Human emotion recognition with biosignals

    Publikacja

    - Rok 2022

    This chapter presents issues in the field of affective computing. Basic preliminary information for the recognition of emotions is given and models of emotions, various ways of evoking emotions, as well as their theoretical foundations are discussed. The particular attention is given to the use of physiological signals in recognizing emotions. This subject is outlined further below by presenting selected biosignals, their relationship...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • System Supporting Speech Perception in Special Educational Needs Schoolchildren

    Publikacja

    - Rok 2012

    The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • High quality speech codec employing sines+noise+transients model

    A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

    Pełny tekst do pobrania w portalu

  • Visual and Auditory Attention Stimulator for Assisting Pedagogical Therapy

    Publikacja

    - Rok 2018

    Visual and auditory attention stimulator provides a system developed in order to improve reading skills using simultaneous presentation of text in its visual form and in transformed auditory form accompanied by related movie material. The described research employed 40 children at the age of 8 13 years having difficulties in learning of reading, who were diagnosed as having developmental dyslexia. It was shown that application...

    Pełny tekst do pobrania w portalu

  • Silence/noise detection for speech and music signals

    Publikacja

    - Rok 2008

    This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

  • Virtual keyboard controlled by eye gaze employing speech synthesis

    Publikacja

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

  • Virtual Keyboard controlled by eye gaze employing speech synthesis

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

    Publikacja

    - Rok 2018

    The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

  • Visual Data Encryption for Privacy Enhancement in Surveillance Systems

    Publikacja

    In this paper a methodology for employing reversible visual encryption of data is proposed. The developed algorithms are focused on privacy enhancement in distributed surveillance architectures. First, motivation of the study performed and a short review of preexisting methods of privacy enhancement are presented. The algorithmic background, system architecture along with a solution for anonymization of sensitive regions of interest...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Corrupted speech intelligibility improvement using adaptive filter based algorithm

    Publikacja

    A technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.

  • Distortion of speech signals in the listening area: its mechanism and measurements

    Publikacja

    - Rok 2014

    The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Visual content representation and retrieval for Cognitive Cyber Physical Systems

    Publikacja

    - Procedia Computer Science - Rok 2019

    Cognitive Cyber Physical Systems have gained significant attention from academia and industry during the past few decade. One of the main reasons behind this interest is the potential of such technologies to revolutionize human life since they intend to work robustly under complex visual scenes, which environmental conditions may vary, adapting to a comprehensive range of unforeseen changes, and exhibiting prospective behavior...

    Pełny tekst do pobrania w portalu

  • A non-uniform real-time speech time-scale stretching method

    Publikacja

    An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...

  • Recognition of Hand Drawn Flowcharts

    Publikacja

    - Rok 2013

    In this paper the problem of hand drawn flowcharts recognition is presented. There are described two attitudes to this problem: on-line and off-line. A concept of FCE, a system for recognizing and understanding of freehand drawn on-line flow charts on desktop computer and mobile devices is presented. The first experiments with the FCE system and the planes for future are also described.

  • Semantic Integration of Heterogeneous Recognition Systems

    Publikacja

    - LECTURE NOTES IN COMPUTER SCIENCE - Rok 2011

    Computer perception of real-life situations is performed using a variety of recognition techniques, including video-based computer vision, biometric systems, RFID devices and others. The proliferation of recognition modules enables development of complex systems by integration of existing components, analogously to the Service Oriented Architecture technology. In the paper, we propose a method that enables integration of information...

  • Using Physiological Signals for Emotion Recognition

    Publikacja

    - Rok 2013

    Recognizing user’s emotions is the promising area of research in a field of human-computer interaction. It is possible to recognize emotions using facial expression, audio signals, body poses, gestures etc. but physiological signals are very useful in this field because they are spontaneous and not controllable. In this paper a problem of using physiological signals for emotion recognition is presented. The kinds of physiological...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Communication Platform for Evaluation of Transmitted Speech Quality

    A voice communication system designed and implemented is described. The purpose of the presented platform was to enable a series of experiments related to the quality assessment of algorithms used in the coding and transmitting of speech. The system is equipped with tools for recording signals at each stage of processing, making it possible to subject them to subjective assessments by listening tests or, objective evaluation employing...

    Pełny tekst do pobrania w portalu

  • Emotion Recognition for Affect Aware Video Games

    In this paper the idea of affect aware video games is presented. A brief review of automatic multimodal affect recognition of facial expressions and emotions is given. The first result of emotions recognition using depth data as well as prototype affect aware video game are presented

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Emotion Recognition and Its Applications

    The paper proposes a set of research scenarios to be applied in four domains: software engineering, website customization, education and gaming. The goal of applying the scenarios is to assess the possibility of using emotion recognition methods in these areas. It also points out the problems of defining sets of emotions to be recognized in different applications, representing the defined emotional states, gathering the data and...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Visual perception of vowels from static and dynamic cues

    The purpose of the study was to analyse human identification of Polish vowels from static and dynamic durationally slowed visual cues. A total of 152 participants identified 6 Polish vowels produced by 4 speakers from static (still images) and dynamic (videos) cues. The results show that 59% of static vowels and 63% of dynamic vowels were successfully identified. There was a strong confusion between vowels within front, central,...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

    Publikacja

    - Rok 2007

    In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

  • Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

    In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

  • Rough Sets Applied to Mood of Music Recognition

    Publikacja

    With the growth of accessible digital music libraries over the past decade, there is a need for research into automated systems for searching, organizing and recommending music. Mood of music is considered as one of the most intuitive criteria for listeners, thus this work is focused on the emotional content of music and its automatic recognition. The research study presented in this work contains an attempt to music emotion recognition...

  • Automated detection of pronunciation errors in non-native English speech employing deep learning

    Publikacja

    - Rok 2023

    Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...

    Pełny tekst do pobrania w portalu

  • Emotion Recognition Using Physiological Signals

    Publikacja

    - Rok 2015

    In this paper the problem of emotion recognition using physiological signals is presented. Firstly the problems with acquisition of physiological signals related to specific human emotions are described. It is not a trivial problem to elicit real emotions and to choose stimuli that always, and for all people, elicit the same emotion. Also different kinds of physiological signals for emotion recognition are considered. A set of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Visual Content Representation for Cognitive Systems: Towards Augmented Intelligence

    Publikacja

    - Rok 2020

    Cognitive Vision Systems have gained significant attention from academia and industry during the past few decades. One of the main reasons behind this interest is the potential of such technologies to revolutionize human life since they intend to work robustly under complex visual scenes (which environmental conditions may vary), adapting to a comprehensive range of unforeseen changes, and exhibiting prospective behavior. The combination...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Facial emotion recognition using depth data

    Publikacja

    - Rok 2015

    In this paper an original approach is presented for facial expression and emotion recognition based only on depth channel from Microsoft Kinect sensor. The emotional user model contains nine emotions including the neutral one. The proposed recognition algorithm uses local movements detection within the face area in order to recognize actual facial expression. This approach has been validated on Facial Expressions and Emotions Database...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Emotion recognition and its application in software engineering

    In this paper a novel application of multimodal emotion recognition algorithms in software engineering is described. Several application scenarios are proposed concerning program usability testing and software process improvement. Also a set of emotional states relevant in that application area is identified. The multimodal emotion recognition method that integrates video and depth channels, physiological signals and input devices...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times

    Publikacja

    Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...

    Pełny tekst do pobrania w portalu

  • Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

    Publikacja
    • D. Korzekwa
    • J. Lorenzo-trueba
    • T. Drugman
    • S. Calamaro
    • B. Kostek

    - Rok 2021

    We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

    Pełny tekst do pobrania w portalu

  • Dependable Integration of Medical Image Recognition Components

    Computer driven medical image recognition may support medical doctors in the diagnosis process, but requires high dependability considering potential consequences of incorrect results. The paper presentsa system that improves dependability of medical image recognition by integration of results from redundant components. The components implement alternative recognition algorithms of diseases in thefield of gastrointestinal endoscopy....

  • Feature extraction in detection and recognition of graphical objects

    Publikacja

    - Rok 2022

    Detection and recognition of graphic objects in images are of great and growing importance in many areas, such as medical and industrial diagnostics, control systems in automation and robotics, or various types of security systems, including biometric security systems related to the recognition of the face or iris of the eye. In addition, there are all systems that facilitate the personal life of the blind people, visually impaired...

  • Mining inconsistent emotion recognition results with the multidimensional model

    Publikacja

    - IEEE Access - Rok 2021

    The paper deals with the challenge of inconsistency in multichannel emotion recognition. The focus of the paper is to explore factors that might influence the inconsistency. The paper reports an experiment that used multi-camera facial expression analysis with multiple recognition systems. The data were analyzed using a multidimensional approach and data mining techniques. The study allowed us to explore camera location, occlusions...

    Pełny tekst do pobrania w portalu

  • Guido: a musical score recognition system

    Publikacja

    - Rok 2007

    This paper presents an optical music recognition system Guido that can automatically recognize the main musical symbols of music scores that were scanned or taken by a digital camera. The application is based on object model of musical notation and uses linguistic approach for symbol interpretation and error correction. The system offers musical editor with a partially automatic error correction.

  • Mowa nienawiści (hate speech) a odpowiedzialność dostawców usług internetowych w orzecznictwie sądów europejskich

    Publikacja

    - Rok 2015

    The article analyses the phenomenon of hate speech in the Internet contrasted with the problem of responsability of Internet Service Providers for cases of such abuses of freedom of expression. The text provides an analysis of jurisprudence of two European Courts. On the one hand it presents the position of the European Court of Human Rights on the problem of hate speech: its definition and the liability for it as an exception...