Search results for: SPEECH EMOTION RECOGNITION

Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition

Publication

J. S. Garcia Salinas
A. A. Torres-García
C. A. Reyes-Garćia
L. Villaseñor-Pineda

- Biomedical Signal Processing and Control - Year 2023

Brain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....

Full text to download in external service

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

Publication

A. Czyżewski
B. Kostek
T. Ciszewski
D. Majewicz

- Year 2013

The bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...

Introduction to the special issue on machine learning in acoustics

Publication

Z. Michalopoulou
P. Gerstoft
B. Kostek
M. A. Roch

- Journal of the Acoustical Society of America - Year 2021

When we started our Call for Papers for a Special Issue on “Machine Learning in Acoustics” in the Journal of the Acoustical Society of America, our ambition was to invite papers in which machine learning was applied to all acoustics areas. They were listed, but not limited to, as follows: • Music and synthesis analysis • Music sentiment analysis • Music perception • Intelligent music recognition • Musical source separation • Singing...

Full text available to download

WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE

Publication

- Year 2018

W niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

Publication

- Year 2016

Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

Full text to download in external service

Towards New Mappings between Emotion Representation Models

Publication

A. Landowska

- Applied Sciences-Basel - Year 2018

There are several models for representing emotions in affect-aware applications, and available emotion recognition solutions provide results using diverse emotion models. As multimodal fusion is beneficial in terms of both accuracy and reliability of emotion recognition, one of the challenges is mapping between the models of affect representation. This paper addresses this issue by: proposing a procedure to elaborate new mappings,...

Full text available to download

Towards Emotion Acquisition in IT Usability Evaluation Context

Publication

A. Landowska

- Year 2015

The paper concerns extension of IT usability studies with automatic analysis of the emotional state of a user. Affect recognition methods and emotion representation models are reviewed and evaluated for applicability in usability testing procedures. Accuracy of emotion recognition, susceptibility to disturbances, independence on human will and interference with usability testing procedures are...

Full text to download in external service

Emotion monitoring system for drivers

Publication

- IFAC-PapersOnLine - Year 2019

This article describes a new approach to the issue of building a driver monitoring system. Actual systems focus, for example, on tracking eyelid and eyebrow movements that result from fatigue. We propose a different approach based on monitoring the state of emotions. Such a system assumes that by using the emotion model based on our own concept, referred to as the reverse Plutchik’s paraboloid of emotions, the recognition of emotions...

Full text available to download

Investigating Feature Spaces for Isolated Word Recognition

Publication

G. Korvel
G. Tamulevicus
P. Treigys
J. Bernataviciene
B. Kostek

- Year 2018

Much attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...

Remote Estimation of Video-Based Vital Signs in Emotion Invocation Studies

Publication

- Year 2018

Abstract— The goal of this study is to examine the influence of various imitated and video invoked emotions on the vital signs (respiratory and pulse rates). We also perform an analysis of the possibility to extract signals from sequences acquired with cost-effective cameras. The preliminary results show that the respiratory rate allows for better separation of some emotions than the pulse rate, yet this relation highly depends...

Full text available to download

Methodology and technology for the polymodal allophonic speech transcription

Publication

- Journal of the Acoustical Society of America - Year 2016

A method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...

Full text to download in external service

Methodology and technology for the polymodal allophonic speech transcription

Publication

- Journal of the Acoustical Society of America - Year 2016

A method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...

Full text to download in external service

Investigating Feature Spaces for Isolated Word Recognition

Publication

P. Treigys
G. Korvel
G. Tamulevicius
J. Bernataviciene
B. Kostek

- Year 2020

The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...

Full text to download in external service

An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

Publication

G. Korvel
O. Kurasova
B. Kostek

- Year 2019

The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

Full text available to download

SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM

Publication

- Journal of the Acoustical Society of America - Year 2023

The main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...

Full text available to download

Rough Sets Applied to Mood of Music Recognition

Publication

- Year 2016

With the growth of accessible digital music libraries over the past decade, there is a need for research into automated systems for searching, organizing and recommending music. Mood of music is considered as one of the most intuitive criteria for listeners, thus this work is focused on the emotional content of music and its automatic recognition. The research study presented in this work contains an attempt to music emotion recognition...

Investigation of educational processes with affective computing methods

Publication

- e-mentor - Year 2017

This paper concerns the monitoring of educational processes with the use of new technologies for the recognition of human emotions. This paper summarizes results from three experiments, aimed at the validation of applying emotion recognition to e-learning. An analysis of the experiments’ executions provides an evaluation of the emotion elicitation methods used to monitor learners. The comparison of affect recognition algorithms...

Full text available to download

Speech Analytics Based on Machine Learning

Publication

- Year 2019

In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

Full text to download in external service

AffecTube — Chrome extension for YouTube video affective annotations

Publication

- SoftwareX - Year 2023

The shortage of emotion-annotated video datasets suitable for training and validating machine learning models for facial expression-based emotion recognition stems primarily from the significant effort and cost required for manual annotation. In this paper, we present AffecTube as a comprehensive solution that leverages crowdsourcing to annotate videos directly on the YouTube platform, resulting in ready-to-use emotion-annotated...

Full text available to download

Examining Feature Vector for Phoneme Recognition

Publication

G. Korvel
B. Kostek

- Year 2018

The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor

Publication

- Year 2015

Spatial filtration of sound is introduced to enhance speech recognition accuracy in noisy conditions. An acoustic vector sensor (AVS) is employed. The signals from the AVS probe are processed in order to attenuate the surrounding noise. As a result the signal to noise ratio is increased. An experiment is featured in which speech signals are disturbed by babble noise. The signals before and after spatial filtration are processed...

Full text to download in external service

KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY

Publication

- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Year 2016

W referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...

DevEmo—Software Developers’ Facial Expression Dataset

Publication

- Applied Sciences-Basel - Year 2023

The COVID-19 pandemic has increased the relevance of remote activities and digital tools for education, work, and other aspects of daily life. This reality has highlighted the need for emotion recognition technology to better understand the emotions of computer users and provide support in remote environments. Emotion recognition can play a critical role in improving the remote experience and ensuring that individuals are able...

Full text available to download

A comparative study of English viseme recognition methods and algorithm

Publication

- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2018

An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...

Full text available to download

A comparative study of English viseme recognition methods and algorithms

Publication

- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2018

An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...

Full text available to download

Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

Publication

- Journal of the Acoustical Society of America - Year 2018

A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

Full text to download in external service

Design Elements of Affect Aware Video Games

Publication

M. Szwoch

- Year 2015

In this paper issues of design and development process of affect-aware video games are presented. Several important design aspects of such games are pointed out. A concept of a middleware framework is proposed that separates the development of affect-aware video games from emotion recognition algorithms and support from input sensors. Finally, two prototype affect-aware video games are presented that conform to the presented architecture...

Full text to download in external service

Affect aware video games

Publication

M. Szwoch

- Year 2022

In this chapter a problem of affect aware video games is described, including such issue as: emotional model of the player, design, development and UX testing of affect-aware video games, multimodal emotion recognition and a featured review of affect-aware video games.

Full text to download in external service

Voice command recognition using hybrid genetic algorithm

Publication

- TASK Quarterly - Year 2010

Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...

Full text available to download

Affective Learning Manifesto – 10 Years Later

Publication

A. Landowska

- Year 2014

In 2004 a group of affective computing researchers proclaimed a manifesto of affective learning that outlined the prospects and white spots of research at that time. Ten years passed by and affective computing developed many methods and tools for tracking human emotional states as well as models for affective systems construction. There are multiple examples of affective methods applications in Intelligent Tutoring Systems (ITS)....

Robot Eye Perspective in Perceiving Facial Expressions in Interaction with Children with Autism

Publication

A. Landowska
B. Robins

- Advances in Intelligent Systems and Computing - Year 2020

The paper concerns automatic facial expression analysis applied in a study of natural “in the wild” interaction between children with autism and a social robot. The paper reports a study that analyzed the recordings captured via a camera located in the eye of a robot. Children with autism exhibit a diverse level of deficits, including ones in social interaction and emotional expression. The aim of the study was to explore the possibility...

Full text to download in external service

Noise profiling for speech enhancement employing machine learning models

Publication

K. Kąkol
G. Korvel
B. Kostek

- Journal of the Acoustical Society of America - Year 2022

This paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...

Full text available to download

Examining Feature Vector for Phoneme Recognition / Analiza parametrów w kontekście automatycznej klasyfikacji fonemów

Publication

- Year 2017

The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

Using Different Information Channels for Affect-Aware Video Games - A Case Study

Publication

- Year 2018

This paper presents the problem of creating affect-aware video games that use different information channels, such as image, video, physiological signals, input devices, and player’s behaviour, for emotion recognition. Presented case studies of three affect-aware games show certain conditions and limitations for using specific signals to recognize emotions and lead to interesting conclusions.

Full text to download in external service

Vocalic Segments Classification Assisted by Mouth Motion Capture

Publication

- Year 2018

Visual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested...

Full text to download in external service

Affective reactions to playing digital games

Publication

- Year 2015

The paper presents a study of emotional states during a gameplay. An experiment of two-player Tetris game is reported, followed by the analysis of the results - self-reported emotional states as well as physiological signals measurements interpretation. The study reveals the diversity of emotional reactions and concludes, that a representative player's emotional model is hard to define. Instead, an adaptive approach to emotion...

Full text to download in external service

Automatic recognition of males and females among web browser users based on behavioural patterns of peripherals usage

Publication

A. Kołakowska
A. Landowska
P. Jarmolkowicz
M. Jarmolkowicz
K. Sobota

- Internet Research - Year 2016

Purpose The purpose of this paper is to answer the question whether it is possible to recognise the gender of a web browser user on the basis of keystroke dynamics and mouse movements. Design/methodology/approach An experiment was organised in order to track mouse and keyboard usage using a special web browser plug-in. After collecting the data, a number of parameters describing the users’ keystrokes, mouse movements and clicks...

Full text to download in external service

PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS

Publication

- Year 2015

The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...

Marking the Allophones Boundaries Based on the DTW Algorithm

Publication

J. Rafałko

- Year 2018

The paper presents an approach to marking the boundaries of allophones in the speech signal based on the Dynamic Time Warping (DTW) algorithm. Setting and marking of allophones boundaries in continuous speech is a difficult issue due to the mutual influence of adjacent phonemes on each other. It is this neighborhood on the one hand that creates variants of phonemes that is allophones, and on the other hand it affects that the border...

MACHINE LEARNING APPLICATIONS IN RECOGNIZING HUMAN EMOTIONS BASED ON THE EEG

Publication

A. Kastrau
M. Koronowski
M. Liksza
P. Jasik

- Year 2021

This study examined the machine learning-based approach allowing the recognition of human emotional states with the use of EEG signals. After a short introduction to the fundamentals of electroencephalography and neural oscillations, the two-dimensional valence-arousal Russell’s model of emotion was described. Next, we present the assumptions of the performed EEG experiment. Detail aspects of the data sanitization including preprocessing,...

The Innovative Faculty for Innovative Technologies

Publication

- Year 2013

A leaflet describing Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology. Multimedia Systems Department described laboratories and prototypes of: Auditory-visual attention stimulator, Automatic video event detection, Object re-identification application for multi-camera surveillance systems, Object Tracking and Automatic Master-Slave PTZ Camera Positioning System, Passive Acoustic Radar,...

Full text to download in external service

Recognizing emotions on the basis of keystroke dynamics

Publication

A. Kołakowska

- Year 2015

The article describes a research on recognizing emotional states on the basis of keystroke dynamics. An overview of various studies and applications of emotion recognition based on data coming from keyboard is presented. Then, the idea of an experiment is presented, i.e. the way of collecting and labeling training data, extracting features and finally training classifiers. Different classification approaches are proposed to be...

Full text to download in external service

Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling

Publication

D. Korzekwa
J. Lorenzo-trueba
S. Zaporowski
S. Calamaro
T. Drugman
B. Kostek

- Year 2021

A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result...

Full text to download in external service

Analysis of human behavioral patterns

Publication

A. Kołakowska

- Year 2022

Widespread usage of Internet and mobile devices entailed growing requirements concerning security which in turn brought about development of biometric methods. However, a specially designed biometric system may infer more about users than just verifying their identity. Proper analysis of users’ characteristics may also tell much about their skills, preferences, feelings. This chapter presents biometric methods applied in several...

Full text to download in external service

Selection of Features for Multimodal Vocalic Segments Classification

Publication

- Year 2018

English speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the inﬂuence of feature vector dimension reduction for the accuracy of vocalic segments classiﬁcation employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...

Full text to download in external service

Discovering Rule-Based Learning Systems for the Purpose of Music Analysis

Publication

G. Korvel
B. Kostek

- Journal of the Acoustical Society of America - Year 2019

Music analysis and processing aims at understanding information retrieved from music (Music Information Retrieval). For the purpose of music data mining, machine learning (ML) methods or statistical approach are employed. Their primary task is recognition of musical instrument sounds, music genre or emotion contained in music, identification of audio, assessment of audio content, etc. In terms of computational approach, music databases...

Full text available to download

Music Mood Visualization Using Self-Organizing Maps

Publication

- Archives of Acoustics - Year 2015

Due to an increasing amount of music being made available in digital form in the Internet, an automatic organization of music is sought. The paper presents an approach to graphical representation of mood of songs based on Self-Organizing Maps. Parameters describing mood of music are proposed and calculated and then analyzed employing correlation with mood dimensions based on the Multidimensional Scaling. A map is created in which...

Full text available to download

Affect-awareness framework for intelligent tutoring systems

Publication

A. Landowska

- Year 2013

The paper proposes a framework for construction of Intelligent Tutoring Systems (ITS), that take into consideration student emotional states and make affective interventions. The paper provides definitions of `affect-aware systems' and `affective interventions' and describes the concept of the affect-awareness framework. The proposed framework separates emotion recognition from its definition, processing and making decisions on...

Full text to download in external service

Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training

Publication

P. Rościszewski

- Procedia Computer Science - Year 2017

In the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...

Full text available to download

Methodology of Affective Intervention Design for Intelligent Systems

Publication

- INTERACTING WITH COMPUTERS - Year 2016

This paper concerns how intelligent systems should be designed to make adequate, valuable and natural affective interventions. The article proposes a process for choosing an affective intervention model for an intelligent system. The process consists of 10 activities that allow for step-by-step design of an affective feedback loop and takes into account the following factors: expected and desired emotional states, characteristics...

Full text to download in external service

Search

Filters

Catalog

Category

Year

Options

Search results for: SPEECH EMOTION RECOGNITION