Filtry
wszystkich: 296
wybranych: 123
Wyniki wyszukiwania dla: LIP-READING, FACIAL MOTION CAPTURE, SPEECH RECOGNITION, VOCALIC SEGMENTS
-
WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE
PublikacjaW niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...
-
Audio-visual aspect of the Lombard effect and comparison with recordings depicting emotional states.
PublikacjaIn this paper an analysis of audio-visual recordings of the Lombard effect is shown. First, audio signal is analyzed indicating the presence of this phenomenon in the recorded sessions. The principal aim, however, was to discuss problems related to extracting differences caused by the Lombard effect, present in the video , i.e. visible as tension and work of facial muscles aligned to an increase in the intensity of the articulated...
-
Robot Eye Perspective in Perceiving Facial Expressions in Interaction with Children with Autism
PublikacjaThe paper concerns automatic facial expression analysis applied in a study of natural “in the wild” interaction between children with autism and a social robot. The paper reports a study that analyzed the recordings captured via a camera located in the eye of a robot. Children with autism exhibit a diverse level of deficits, including ones in social interaction and emotional expression. The aim of the study was to explore the possibility...
-
Evaluation Criteria for Affect-Annotated Databases
PublikacjaIn this paper a set of comprehensive evaluation criteria for affect-annotated databases is proposed. These criteria can be used for evaluation of the quality of a database on the stage of its creation as well as for evaluation and comparison of existing databases. The usefulness of these criteria is demonstrated on several databases selected from affect computing domain. The databases contain different kind of data: video or still...
-
Intelligent multimedia solutions supporting special education needs.
PublikacjaThe role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
-
Intelligent video and audio applications for learning enhancement
PublikacjaThe role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
-
Methodology and technology for the polymodal allophonic speech transcription
PublikacjaA method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...
-
Methodology and technology for the polymodal allophonic speech transcription
PublikacjaA method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...
-
On Facial Expressions and Emotions RGB-D Database
PublikacjaThe goal of this paper is to present the idea of creating reference database of RGB-D video recordings for recognition of facial expressions and emotions. Two different formats of the recordings used for creation of two versions of the database are described and compared using different criteria. Examples of first applications using databases are also presented to evaluate their usefulness.
-
Using Physiological Signals for Emotion Recognition
PublikacjaRecognizing user’s emotions is the promising area of research in a field of human-computer interaction. It is possible to recognize emotions using facial expression, audio signals, body poses, gestures etc. but physiological signals are very useful in this field because they are spontaneous and not controllable. In this paper a problem of using physiological signals for emotion recognition is presented. The kinds of physiological...
-
Investigating Feature Spaces for Isolated Word Recognition
PublikacjaThe study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...
-
Mining inconsistent emotion recognition results with the multidimensional model
PublikacjaThe paper deals with the challenge of inconsistency in multichannel emotion recognition. The focus of the paper is to explore factors that might influence the inconsistency. The paper reports an experiment that used multi-camera facial expression analysis with multiple recognition systems. The data were analyzed using a multidimensional approach and data mining techniques. The study allowed us to explore camera location, occlusions...
-
An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics
PublikacjaThe speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...
-
Variable Ratio Sample Rate Conversion Based on Fractional Delay Filter
PublikacjaIn this paper a sample rate conversion algorithm which allows for continuously changing resampling ratio has been presented. The proposed implementation is based on a variable fractional delay filter which is implemented by means of a Farrow structure. Coefficients of this structure are computed on the basis of fractional delay filters which are designed using the offset window method. The proposed approach allows us to freely...
-
Biometryczna kontrola dostępu
PublikacjaOpisano szczegółowo algorytm detekcji oraz identyfikacji człowieka na podstawie punktów nodalnych twarzy. Zdefiniowano pojęcia: biometria, proces pomiaru biometrycznego, metody biometrycznej identyfikacji oraz kontrola dostępu. Przedstawiono opis opracowanego systemu biometrycznej identyfikacji wykorzystującego sztuczne sieci neuronowe. Podano wyniki badań oraz przeprowadzono ich wnikliwą dyskusję.Biometrics is the study of automated...
-
Metoda i algorytmy modyfikacji sygnału do celu wspomagania rozumienia mowy przez osoby z pogorszoną rozdzielczością czasową słuchu
PublikacjaPrzedmiotem badań przeprowadzonych w ramach rozprawy są metody modyfikacji czasu trwania sygnału (ang. Time Scale Modification –TSM) mowy operujące w czasie rzeczywistym oraz ocena ich wpływu na rozumienie wypowiedzi przez osoby z pogorszoną rozdzielczością czasową słuchu. Pogorszona rozdzielczość słuchu jest jednym z symptomów związanych z ośrodkowymi zaburzeniami słuchu (ang. Cetnral Auditory Processing Disorder – CAPD). W odróżnieniu...
-
Emotion Recognition - the need for a complete analysis of the phenomenon of expression formation
PublikacjaThis article shows how complex emotions are. This has been proven by the analysis of the changes that occur on the face. The authors present the problem of image analysis for the purpose of identifying emotions. In addition, they point out the importance of recording the phenomenon of the development of emotions on the human face with the use of high-speed cameras, which allows the detection of micro expression. The work that was...
-
Comparison of Methods for Real and Imaginary Motion Classification from EEG Signals
PublikacjaA method for feature extraction and results of classification of EEG signals obtained from performed and imagined motion are presented. A set of 615 features was obtained to serve for the recognition of type and laterality of motion using 8 different classifications approaches. A comparison of achieved classifiers accuracy is presented in the paper, and then conclusions and discussion are provided. Among applied algorithms the...
-
SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM
PublikacjaThe main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...
-
Detection of Face Position and Orientation Using Depth Data
PublikacjaIn this paper an original approach is presented for real-time detection of user's face position and orientation based only on depth channel from a Microsoft Kinect sensor which can be used in facial analysis on scenes with poor lighting conditions where traditional algorithms based on optical channel may have failed. Thus the proposed approach can support, or even replace, algorithms based on optical channel or based on skeleton...
-
Comparative analysis of various transformation techniques for voiceless consonants modeling
PublikacjaIn this paper, a comparison of various transformation techniques, namely Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Discrete Walsh Hadamard Transform (DWHT) are performed in the context of their application to voiceless consonant modeling. Speech features based on these transformation techniques are extracted. These features are mean and derivative values of cepstrum coefficients, derived from each transformation....
-
PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS
PublikacjaThe quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...
-
Marking the Allophones Boundaries Based on the DTW Algorithm
PublikacjaThe paper presents an approach to marking the boundaries of allophones in the speech signal based on the Dynamic Time Warping (DTW) algorithm. Setting and marking of allophones boundaries in continuous speech is a difficult issue due to the mutual influence of adjacent phonemes on each other. It is this neighborhood on the one hand that creates variants of phonemes that is allophones, and on the other hand it affects that the border...
-
Determining Pronunciation Differences in English Allophones Utilizing Audio Signal Parameterization
PublikacjaAn allophonic description of English plosive consonants, based on audio-visual recordings of 600 specially selected words, was developed. First, several speakers were recorded while reading words from a teleprompter. Then, every word was played back from the previously recorded sample read by a phonology expert and each examined speaker repeated a particular word trying to imitate correct pronunciation. The next step consisted...
-
A New Method for Automatic Generation of Animated Motion
PublikacjaA new method for generation of animation with a quality comparable to a natural motion is presented. Proposed algorithm is based on fuzzy description of motion parameters and subjective features. It is assumed that such processing increases naturalness and quality of motion, which is verified by subjective evaluation tests. First, reference motion data are gathered utilizing a motion capture system, then these data are reduced...
-
AffecTube — Chrome extension for YouTube video affective annotations
PublikacjaThe shortage of emotion-annotated video datasets suitable for training and validating machine learning models for facial expression-based emotion recognition stems primarily from the significant effort and cost required for manual annotation. In this paper, we present AffecTube as a comprehensive solution that leverages crowdsourcing to annotate videos directly on the YouTube platform, resulting in ready-to-use emotion-annotated...
-
Centrifugal magnetic fluid seals
PublikacjaSeals perform an important and at times overlooked function in a lubricant system. They prevent or minimize the movement of the lubricant between two surfaces and also minimize the level of contamination in the system. Seals are divided into static and dynamic types. Static seals operate in systems where the mating surfaces do not move relative to each other. In contrast, dynamic seals operate under conditions where the mating surfaces...
-
DevEmo—Software Developers’ Facial Expression Dataset
PublikacjaThe COVID-19 pandemic has increased the relevance of remote activities and digital tools for education, work, and other aspects of daily life. This reality has highlighted the need for emotion recognition technology to better understand the emotions of computer users and provide support in remote environments. Emotion recognition can play a critical role in improving the remote experience and ensuring that individuals are able...
-
The Innovative Faculty for Innovative Technologies
PublikacjaA leaflet describing Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology. Multimedia Systems Department described laboratories and prototypes of: Auditory-visual attention stimulator, Automatic video event detection, Object re-identification application for multi-camera surveillance systems, Object Tracking and Automatic Master-Slave PTZ Camera Positioning System, Passive Acoustic Radar,...
-
Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling
PublikacjaA common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result...
-
Objectivization of phonological evaluation of speech elements by means of audio parametrization
PublikacjaThis study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...
-
Speech Analytics Based on Machine Learning
PublikacjaIn this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...
-
Examining Feature Vector for Phoneme Recognition
PublikacjaThe aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...
-
Hand gesture recognition supported by fuzzy rules and Kalman filters
PublikacjaThe paper presents a system based on camera and multimediaprojector enabling a user to control computer applications by dynamic hand gestures. Gesture recognition methodology based on representing hand movement trajectory by motion vectors analysed using fuzzy rule-based inference is first given. For effective hand position tracking Kalman filters are employed. The system engineered is developed using J2SE and C++/OpenCV technology....
-
Comparison of Classification Methods for EEG Signals of Real and Imaginary Motion
PublikacjaThe classification of EEG signals provides an important element of brain-computer interface (BCI) applications, underlying an efficient interaction between a human and a computer application. The BCI applications can be especially useful for people with disabilities. Numerous experiments aim at recognition of motion intent of left or right hand being useful for locked-in-state or paralyzed subjects in controlling computer applications....
-
A comparative study of English viseme recognition methods and algorithm
PublikacjaAn elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...
-
A comparative study of English viseme recognition methods and algorithms
PublikacjaAn elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...
-
Horizontally-split-drain MAGFET - a highly sensitive magnetic field sensor
PublikacjaWe propose a novel magnetic field sensitive semiconductor device, viz., Horizontally-Split-Drain Magnetic-Field Sensitive Field-Effect Transistor (HSDMAGFET) which can be used to measure or detect steady or variable magnetic fields. Operating principle of the transistor is based on one of the galvanomagnetic phenomena and a Gradual Channel Detachment Effect (GCDE) and is very similar to that of Popovic and Baltes's SDMAGFET. The...
-
Emotion monitoring system for drivers
PublikacjaThis article describes a new approach to the issue of building a driver monitoring system. Actual systems focus, for example, on tracking eyelid and eyebrow movements that result from fatigue. We propose a different approach based on monitoring the state of emotions. Such a system assumes that by using the emotion model based on our own concept, referred to as the reverse Plutchik’s paraboloid of emotions, the recognition of emotions...
-
Just look at to open it up: A biometric verification facility for password autofill to protect electronic documents
PublikacjaElectronic documents constitute specific units of information, and protecting them against unauthorized access is a challenging task. This is because a password protected document may be stolen from its host computer or intercepted while on transfer and exposed to unlimited offline attacks. The key issue is, therefore, making document passwords hard to crack. We propose to augment a common text password authentication interface...
-
Remote Estimation of Video-Based Vital Signs in Emotion Invocation Studies
PublikacjaAbstract— The goal of this study is to examine the influence of various imitated and video invoked emotions on the vital signs (respiratory and pulse rates). We also perform an analysis of the possibility to extract signals from sequences acquired with cost-effective cameras. The preliminary results show that the respiratory rate allows for better separation of some emotions than the pulse rate, yet this relation highly depends...
-
Fuzzy rule-based dynamic gesture recognition employing camera & multimedia projector
PublikacjaIn the paper the system based on camera and multimedia projector enabling a user to control computer applications by dynamic hand gestures is presented. The main objective is to present the gesture recognition methodology which bases on representing hand movement trajectory by motion vectors analyzed using fuzzy rule-based inference. The approach was engineered in the system developed with J2SE and C++ / OpenCV technology. OpenCV...
-
Influence of Thermal Imagery Resolution on Accuracy of Deep Learning based Face Recognition
PublikacjaHuman-system interactions frequently require a retrieval of the key context information about the user and the environment. Image processing techniques have been widely applied in this area, providing details about recognized objects, people and actions. Considering remote diagnostics solutions, e.g. non-contact vital signs estimation and smart home monitoring systems that utilize person’s identity, security is a very important factor....
-
Thermal Images Analysis Methods using Deep Learning Techniques for the Needs of Remote Medical Diagnostics
PublikacjaRemote medical diagnostic solutions have recently gained more importance due to global demographic shifts and play a key role in evaluation of health status during epidemic. Contactless estimation of vital signs with image processing techniques is especially important since it allows for obtaining health status without the use of additional sensors. Thermography enables us to reveal additional details, imperceptible in images acquired...
-
Analysis of allophones based on audio signal recordings and parameterization
PublikacjaThe aim of this study is to develop an allophonic description of English plosive consonants based on recordings of 600 specially selected words. Allophonic variations addressed in the study may have two sources: positional and contextual. The former one depends on the syllabic or prosodic position in which a particular phoneme occurs. Contextual allophony is conditioned by the local phonetic environment. Co-articulation overlapping...
-
Sensing Direction of Human Motion Using Single-Input-Single-Output (SISO) Channel Model and Neural Networks
PublikacjaObject detection Through-the-Walls enables localization and identification of hidden objects behind the walls. While numerous studies have exploited Channel State Information of Multiple Input Multiple Output (MIMO) WiFi and radar devices in association with Artificial Intelligence based algorithms (AI) to detect and localize objects behind walls, this study proposes a novel non-invasive Through-the-Walls human motion direction...
-
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
PublikacjaIn the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...
-
Examining Feature Vector for Phoneme Recognition / Analiza parametrów w kontekście automatycznej klasyfikacji fonemów
PublikacjaThe aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...
-
Voice command recognition using hybrid genetic algorithm
PublikacjaAbstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...
-
Data regarding a new, vector-enzymatic DNA fragment amplification-expression technology for the construction of artificial, concatemeric DNA, RNA and proteins, as well as biological effects of selected polypeptides obtained using this method
PublikacjaApplications of bioactive peptides and polypeptides are emerging in areas such as drug development and drug delivery systems. These compounds are bioactive, biocompatible and represent a wide range of chemical properties, enabling further adjustments of obtained biomaterials. However, delivering large quantities of peptide derivatives is still challenging. Several methods have been developed for the production of concatemers –...