displaying 1000 best results Help
Search results for: AUTOMATIC SPEECH RECOGNITION, WHISPER, MEDICAL LANGUAGE RECOGNITION, SPEECH PROCESSING
-
Building Knowledge for the Purpose of Lip Speech Identification
PublicationConsecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...
-
A Method of Real-Time Non-uniform Speech Stretching
PublicationDeveloped method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...
-
A comparative study of English viseme recognition methods and algorithms
PublicationAn elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...
-
A comparative study of English viseme recognition methods and algorithm
PublicationAn elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...
-
Automatic recognition of males and females among web browser users based on behavioural patterns of peripherals usage
PublicationPurpose The purpose of this paper is to answer the question whether it is possible to recognise the gender of a web browser user on the basis of keystroke dynamics and mouse movements. Design/methodology/approach An experiment was organised in order to track mouse and keyboard usage using a special web browser plug-in. After collecting the data, a number of parameters describing the users’ keystrokes, mouse movements and clicks...
-
Emotion Recognition for Affect Aware Video Games
PublicationIn this paper the idea of affect aware video games is presented. A brief review of automatic multimodal affect recognition of facial expressions and emotions is given. The first result of emotions recognition using depth data as well as prototype affect aware video game are presented
-
COMPUTER SPEECH AND LANGUAGE
Journals -
SEMINARS IN SPEECH AND LANGUAGE
Journals -
Speech and Language Technology
Journals -
Speech Language and Hearing
Journals -
Adaptive system for recognition of sounds indicating threats to security of people and property employing parallel processing of audio data streams
PublicationA system for recognition of threatening acoustic events employing parallel processing on a supercomputing cluster is featured. The methods for detection, parameterization and classication of acoustic events are introduced. The recognition engine is based onthreshold-based detection with adaptive threshold and Support Vector Machine classifcation. Spectral, temporal and mel-frequency descriptors are used as signal features. The...
-
Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech
PublicationWe present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...
-
JOURNAL OF MEDICAL SPEECH-LANGUAGE PATHOLOGY
Journals -
Comparison of various speech time-scale modificartion methods
PublicationThe objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...
-
Noise profiling for speech enhancement employing machine learning models
PublicationThis paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...
-
Speech codec enhancements utilizing time compression and perceptual coding
PublicationA method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...
-
Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit
PublicationMethods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...
-
Recognition and sensing of anions
PublicationMolecular ion recognition is one of the most intensively studied areas of supramolecular technology. The reason for this is the essential role that ions play in many biological as well as industrial processes. On the other hand, however, it has been proved that ions can have a negative impact on human health and the environment. For these reasons, it is extremly important to develop rapid and simple methods allowing the determination...
-
Guido: a musical score recognition system
PublicationThis paper presents an optical music recognition system Guido that can automatically recognize the main musical symbols of music scores that were scanned or taken by a digital camera. The application is based on object model of musical notation and uses linguistic approach for symbol interpretation and error correction. The system offers musical editor with a partially automatic error correction.
-
Feature extraction in detection and recognition of graphical objects
PublicationDetection and recognition of graphic objects in images are of great and growing importance in many areas, such as medical and industrial diagnostics, control systems in automation and robotics, or various types of security systems, including biometric security systems related to the recognition of the face or iris of the eye. In addition, there are all systems that facilitate the personal life of the blind people, visually impaired...
-
Communication Platform for Evaluation of Transmitted Speech Quality
PublicationA voice communication system designed and implemented is described. The purpose of the presented platform was to enable a series of experiments related to the quality assessment of algorithms used in the coding and transmitting of speech. The system is equipped with tools for recording signals at each stage of processing, making it possible to subject them to subjective assessments by listening tests or, objective evaluation employing...
-
System of speech signal processing and visualisation for linguistic purposes
Publication -
Limitations of Emotion Recognition from Facial Expressions in e-Learning Context
PublicationThe paper concerns technology of automatic emotion recognition applied in e-learning environment. During a study of e-learning process the authors applied facial expressions observation via multiple video cameras. Preliminary analysis of the facial expressions using automatic emotion recognition tools revealed several unexpected results, including unavailability of recognition due to face coverage and significant inconsistency...
-
Ranking Speech Features for Their Usage in Singing Emotion Classification
PublicationThis paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...
-
Human emotion recognition with biosignals
PublicationThis chapter presents issues in the field of affective computing. Basic preliminary information for the recognition of emotions is given and models of emotions, various ways of evoking emotions, as well as their theoretical foundations are discussed. The particular attention is given to the use of physiological signals in recognizing emotions. This subject is outlined further below by presenting selected biosignals, their relationship...
-
Mining inconsistent emotion recognition results with the multidimensional model
PublicationThe paper deals with the challenge of inconsistency in multichannel emotion recognition. The focus of the paper is to explore factors that might influence the inconsistency. The paper reports an experiment that used multi-camera facial expression analysis with multiple recognition systems. The data were analyzed using a multidimensional approach and data mining techniques. The study allowed us to explore camera location, occlusions...
-
High quality speech codec employing sines+noise+transients model
PublicationA method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...
-
Virtual keyboard controlled by eye gaze employing speech synthesis
PublicationThe article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...
-
Virtual Keyboard controlled by eye gaze employing speech synthesis
PublicationThe article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...
-
Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions
PublicationThe aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...
-
Recognition of hazardous acoustic events employing parallel processing on a supercomputing cluster . Rozpoznawanie niebezpiecznych zdarzeń dźwiękowych z wykorzystaniem równoległego przetwarzania na klastrze superkomputerowym
PublicationA method for automatic recognition of hazardous acoustic events operating on a super computing cluster is introduced. The methods employed for detecting and classifying the acoustic events are outlined. The evaluation of the recognition engine is provided: both on the training set and using real-life signals. The algorithms yield sufficient performance in practical conditions to be employed in security surveillance systems. The...
-
Camera-based Automatic System for Tool Measurements and Recognition
Publication -
Automatic recognition of the arterial input function in MRI studies
PublicationArtykuł prezentuje opis automatycznej metody detekcji tętniczej funkcji wejście (AIF). Metoda została porównana z klinicznie pomierzonymi seriami obrazów DSC-MRI.
-
Corrupted speech intelligibility improvement using adaptive filter based algorithm
PublicationA technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.
-
Distortion of speech signals in the listening area: its mechanism and measurements
PublicationThe paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...
-
Limitations of Emotion Recognition in Software User Experience Evaluation Context
PublicationThis paper concerns how an affective-behavioural- cognitive approach applies to the evaluation of the software user experience. Although it may seem that affect recognition solutions are accurate in determining the user experience, there are several challenges in practice. This paper aims to explore the limitations of the automatic affect recognition applied in the usability context as well as...
-
Scoreboard Architectural Pattern and Integration of Emotion Recognition Results
PublicationThis paper proposes a new design pattern, named Scoreboard , dedicated for applications solving complex, multi-stage, non-deterministic problems. The pattern provides a computational framework for the design and implementation of systems that integrate a large number of diverse specialized modules that may vary in accuracy, solution level, and modality. The Scoreboard is an extension of Blackboard design pattern and comes under...
-
A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times
PublicationObjective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...
-
A non-uniform real-time speech time-scale stretching method
PublicationAn algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...
-
Multiclass AdaBoost Classifier Parameter Adaptation for Pattern Recognition
PublicationThe article presents the problem of parameter value selection of the multiclass ``one against all'' approach of an AdaBoost algorithm in tasks of object recognition based on two-dimensional graphical images. AdaBoost classifier with Haar features is still used in mobile devices due to the processing speed in contrast to other methods like deep learning or SVM but its main drawback is the need to assembly the results of binary...
-
Extracting concepts from the software requirements specification using natural language processing
PublicationExtracting concepts from the software require¬ments is one of the first step on the way to automating the software development process. This task is difficult due to the ambiguity of the natural language used to express the requirements specification. The methods used so far consist mainly of statistical analysis of words and matching expressions with a specific ontology of the domain in which the planned software will be applicable....
-
Recognition of Hand Drawn Flowcharts
PublicationIn this paper the problem of hand drawn flowcharts recognition is presented. There are described two attitudes to this problem: on-line and off-line. A concept of FCE, a system for recognizing and understanding of freehand drawn on-line flow charts on desktop computer and mobile devices is presented. The first experiments with the FCE system and the planes for future are also described.
-
Semantic Integration of Heterogeneous Recognition Systems
PublicationComputer perception of real-life situations is performed using a variety of recognition techniques, including video-based computer vision, biometric systems, RFID devices and others. The proliferation of recognition modules enables development of complex systems by integration of existing components, analogously to the Service Oriented Architecture technology. In the paper, we propose a method that enables integration of information...
-
Using Physiological Signals for Emotion Recognition
PublicationRecognizing user’s emotions is the promising area of research in a field of human-computer interaction. It is possible to recognize emotions using facial expression, audio signals, body poses, gestures etc. but physiological signals are very useful in this field because they are spontaneous and not controllable. In this paper a problem of using physiological signals for emotion recognition is presented. The kinds of physiological...
-
Emotions in polish speech recordings
Open Research DataThe data set presents emotions recorded in sound files that are expressions of Polish speech. Statements were made by people aged 21-23, young voices of 5 men. Each person said the following words / nie – no, oddaj - give back, podaj – pass, stop - stop, tak - yes, trzymaj -hold / five times representing a specific emotion - one of three - anger (a),...
-
Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System
PublicationAlthough there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...
-
Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine
PublicationIn order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...
-
Emotion Recognition and Its Applications
PublicationThe paper proposes a set of research scenarios to be applied in four domains: software engineering, website customization, education and gaming. The goal of applying the scenarios is to assess the possibility of using emotion recognition methods in these areas. It also points out the problems of defining sets of emotions to be recognized in different applications, representing the defined emotional states, gathering the data and...
-
Preliminary Study on Automatic Recognition of Spatial Expressions in Polish Texts
Publication -
Soft computing based automatic recognition of musical instrument classes.
PublicationW artykule przedstawiono wyniki eksperymentów dotyczących automatycznego rozpoznawania klas instrumentów muzycznych. Proces klasyfikacji zrealizowano w oparciu o sztuczne sieci neuronowe, zaś wektor cch został oparty o parametry obliczane w wyniku analizy falkowej dźwięków instrumentów muzycznych.