Search results for: AUDIOVISUAL SPEECH RECOGNITION

Prof. Haitham Abu-Rub - A Visit to Poland's Gdansk University of Technology

Publication

J. Guziński

- IEEE Industrial Electronics Magazine - Year 2015

Report on visit of Prof. Haitham Abu-Rub in Gdansk University of Technology. Speech on the Smart Grid Centre. Visit in the new smart grid laboratory of the GUT, the Laboratory for Innovative Power Technologies and Integration of Renewable Energy Sources (LINTE^2).

Full text available to download

A Comparison of STI Measured by Direct and Indirect Methods for Interiors Coupled with Sound Reinforcement Systems

Publication

- Year 2018

This paper presents a comparison of STI (Speech Transmission Index) coefficient measurement results carried out by direct and indirect methods. First, acoustic parameters important in the context of public address and sound reinforcement systems are recalled. A measurement methodology is presented that employs various test signals to determine impulse responses. The process of evaluating sound system performance, signals enabling...

Full text to download in external service

Investigation of educational processes with affective computing methods

Publication

- e-mentor - Year 2017

This paper concerns the monitoring of educational processes with the use of new technologies for the recognition of human emotions. This paper summarizes results from three experiments, aimed at the validation of applying emotion recognition to e-learning. An analysis of the experiments’ executions provides an evaluation of the emotion elicitation methods used to monitor learners. The comparison of affect recognition algorithms...

Full text available to download

Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set

Publication

P. Filipowicz
B. Kostek

- Applied Sciences-Basel - Year 2023

This work deals with automatically detecting stuttering and its subclasses. An effective classification of stuttering along with its subclasses could find wide application in determining the severity of stuttering by speech therapists, preliminary patient diagnosis, and enabling communication with the previously mentioned voice assistants. The first part of this work provides an overview of examples of classical and deep learning...

Full text available to download

Gesture-based computer control system

Publication

- Elektronika : konstrukcje, technologie, zastosowania - Year 2010

In the paper a system for controlling computer applications by hand gestures is presented. First, selected methods used for gesture recognition are described. The system hardware and a way of controlling a computer by gestures are described. The architecture of the software along with hand gesture recognition methods and algorithms used are presented. Examples of basic and complex gestures recognized by the system are given.

Full text to download in external service

Automatic Classification of Polish Sign Language Words

Publication

- Przegląd Elektrotechniczny - Year 2014

In the article we present the approach to automatic recognition of hand gestures using eGlove device. We present the research results of the system for detection and classification of static and dynamic words of Polish language. The results indicate the usage of eGlove allows to gain good recognition quality that additionally can be improved using additional data sources such as RGB cameras.

Full text available to download

Modeling and Designing Acoustical Conditions of the Interior – Case Study

Publication

- Archives of Acoustics - Year 2016

The primary aim of this research study was to model acoustic conditions of the Courtyard of the Gdańsk University of Technology Main Building, and then to design a sound reinforcement system for this interior. First, results of measurements of the parameters of the acoustic field are presented. Then, the comparison between measured and predicted values using the ODEON program is shown. Collected data indicate a long reverberation...

Full text available to download

Comparative analysis of various transformation techniques for voiceless consonants modeling

Publication

G. Korvel
B. Kostek
O. Kurasova

- International Journal of Computers Communications & Control - Year 2018

In this paper, a comparison of various transformation techniques, namely Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Discrete Walsh Hadamard Transform (DWHT) are performed in the context of their application to voiceless consonant modeling. Speech features based on these transformation techniques are extracted. These features are mean and derivative values of cepstrum coefficients, derived from each transformation....

Full text available to download

Automatic music set organizatio based on mood of music / Automatyczna organizacja bazy muzycznej na podstawie nastroju muzyki

Publication

M. Piotrowska

- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Year 2017

This work is focused on an approach based on the emotional content of music and its automatic recognition. A vector of features describing emotional content of music was proposed. Additionally, a graphical model dedicated to the subjective evaluation of mood of music was created. A series of listening tests was carried out, and results were compared with automatic mood recognition employing SOM (Self Organizing Maps) and ANN (Artificial...

Full text to download in external service

Employing a biofeedback method based on hemispheric synchronization in effective learning

Publication

- Year 2012

In this paper an approach to build a brain computer-based hemispheric synchronization system is presented. The concept utilizes the wireless EEG signal registration and acquisition as well as advanced pre-processing methods. The influence of various filtration techniques of EOG artifacts on brain state recognition is examined. The emphasis is put on brain state recognition using band pass filtration for separation of individual...

Full text to download in external service

Endoscopic Video Classification with the Consideration of Temporal Patterns

Publication

- Year 2012

The article describes a novel approach to automatic recognition and classification of diseases in endoscopic videos. Current directions of research in this field are discussed. Most presented methods focus on processing single frames and do not take into consideration the temporal relationship between continuous classifications. Existing approaches that consider the temporal structure of an incoming frame sequence are focused on...

Wykorzystanie sztucznych sieci neuronowych do wykrywania i rozpoznawania tablic rejestracyjnych na zdjęciach pojazdów

Publication

- Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej - Year 2015

W artykule przedstawiono koncepcję algorytmu wykrywania i rozpoznawania tablic rejestracyjnych (AWiRTR) na obrazach cyfrowych pojazdów. Detekcja i lokalizacja tablic rejestracyjnych oraz wyodrębnienie z obrazu tablicy rejestracyjnej poszczególnych znaków odbywa się z wykorzystaniem podstawowych technik przetwarzania obrazu (przekształcenia morfologiczne, wykrywanie krawędzi) jak i podstawowych danych statystycznych obiektów wykrytych...

Full text available to download

A video monitoring system using ontology-driven identification of threats

Publication

- Year 2009

In this paper, we present a video monitoring systemthat leverages image recognition and ontological reasoningabout threats. In the solution, an image processing subsystemuses video recording of a monitored area and recognizesknown concepts in scenes. Then, a reasoning subsystem uses anontological description of security conditions and informationfrom image recognition to check if a violation of a conditionhas occurred. If a threat...

Full text to download in external service

FEEDB: A multimodal database of facial expressions and emotions

Publication

M. Szwoch

- Year 2013

In this paper a first version of a multimodal FEEDB database of facial expressions and emotions is presented. The database contains labeled RGB-D recordings of people expressing a specific set of expressions that have been recorded using Microsoft Kinect sensor. Such a database can be used for classifier training and testing in face recognition as well as in recognition of facial expressions and human emotions. Also initial experiences...

Full text to download in external service

Towards New Mappings between Emotion Representation Models

Publication

A. Landowska

- Applied Sciences-Basel - Year 2018

There are several models for representing emotions in affect-aware applications, and available emotion recognition solutions provide results using diverse emotion models. As multimodal fusion is beneficial in terms of both accuracy and reliability of emotion recognition, one of the challenges is mapping between the models of affect representation. This paper addresses this issue by: proposing a procedure to elaborate new mappings,...

Full text available to download

Playback detection using machine learning with spectrogram features approach

Publication

- Year 2017

This paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on different data partitions in respect...

Full text available to download

Endoscopic Videos Deinterlacing and On-Screen Text and Light Flashes Removal and Its Influence on Image Analysis Algorithms' Efficiency

Publication

- International Journal of Image Processing and Visual Communication - Year 2013

In this article, deinterlacing and removing on- screen text and light flashes methods on endoscopic video images are discussed. The research is intended to improve disease recognition algorithms' performance. In the article, four configurations of deinterlacing methods and another four configurations of text and flashes removal methods are described and examined. The efficiency of endoscopic video analysis algorithms is measured...

Full text to download in external service

An electronic nose for quantitative determination of gas concentrations

Publication

- Year 2016

The practical application of human nose for fragrance recognition is severely limited by the fact that our sense of smell is subjective and gets tired easily. Consequen tly, there is considerable need for an instrument that can be a substitution of the human sense of smell. Electronic nose devices from the mid 1980s are used in growing number of applications. They comprise an array of several electrochemical gas sensors...

Full text to download in external service

Towards Emotion Acquisition in IT Usability Evaluation Context

Publication

A. Landowska

- Year 2015

The paper concerns extension of IT usability studies with automatic analysis of the emotional state of a user. Affect recognition methods and emotion representation models are reviewed and evaluated for applicability in usability testing procedures. Accuracy of emotion recognition, susceptibility to disturbances, independence on human will and interference with usability testing procedures are...

Full text to download in external service

Video Semantic Analysis Framework based on Run-time Production Rules - Towards Cognitive Vision

Publication

E. Szczerbicki
C. Toro
C. Sanin

- JOURNAL OF UNIVERSAL COMPUTER SCIENCE - Year 2015

This paper proposes a service-oriented architecture for video analysis which separates object detection from event recognition. Our aim is to introduce new tools to be considered in the pathway towards Cognitive Vision as a support for classical Computer Vision techniques that have been broadly used by the scientific community. In the article, we particularly focus in solving some of the reported scalability issues found in current...

Full text available to download

Intelligent multimedia solutions supporting special education needs.

Publication

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011

The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

Intelligent video and audio applications for learning enhancement

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2011

The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

Full text available to download

Evaluation Criteria for Affect-Annotated Databases

Publication

- Year 2015

In this paper a set of comprehensive evaluation criteria for affect-annotated databases is proposed. These criteria can be used for evaluation of the quality of a database on the stage of its creation as well as for evaluation and comparison of existing databases. The usefulness of these criteria is demonstrated on several databases selected from affect computing domain. The databases contain different kind of data: video or still...

Full text to download in external service

Geometric Algebra Model of Distributed Representations

Publication

A. Patyk-Łońska

- Year 2010

Formalism based on GA is an alternative to distributed representation models developed so far-Smolensky's tensor product, Holographic Reduced Representations (HRR) and Binary Spatter Code (BSC). Convolutions are replaced by geometric products, interpretable in terms of geometry which seems to be the most natural language for visualization of higher concepts. This paper recalls the main ideas behind the GA model and investigates...

The project IDENT: Multimodal biometric system for bank client identity verification

Publication

- Year 2017

Biometric identity verification methods are implemented inside a real banking environment comprising: dynamic handwritten signature verification, face recognition, bank cli-ent voice recognition and hand vein distribution verification. A secure communication system based on an intra-bank client-server architecture was designed for this purpose. Hitherto achieved progress within the project is reported in this paper with a focus...

Full text to download in external service

DevEmo—Software Developers’ Facial Expression Dataset

Publication

- Applied Sciences-Basel - Year 2023

The COVID-19 pandemic has increased the relevance of remote activities and digital tools for education, work, and other aspects of daily life. This reality has highlighted the need for emotion recognition technology to better understand the emotions of computer users and provide support in remote environments. Emotion recognition can play a critical role in improving the remote experience and ensuring that individuals are able...

Full text available to download

Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary

Publication

- Year 2018

This paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments...

Full text available to download

Artificial intelligence support for disease detection in wireless capsule endoscopy images of human large bowel

Publication

J. Cychnerski

- Year 2011

In the work the chosen algorithms of disease recognition in endoscopy images were described and compared for theirs efficiency. The algorithms were estimated with regard to utility for application in computer system's support for digestive system's diagnostics. Estimations were achieved in an advanced testing environment, which was built with use of the large collection of endoscopy movies received from Medical University in Gdańsk....

Sensors integration in the smart home environment - a proposal to solve the problem with user identification

Publication

- Year 2019

In this preliminary study we, investigate the possibility of user recognition techniques suitable on smart home devices like chairs, beds, aiming for low–power, high accuracy and quick response time. We propose the two well know technique: voice speaker recognition and accelerometer signal from device mounted on the chair, and the third one optical system basing on IR LED transmitter/receiver circuit. The preliminary results proved...

Full text to download in external service

Analysis-by-synthesis paradigm evolved into a new concept

Publication

B. Kostek

- Journal of the Acoustical Society of America - Year 2022

This work aims at showing how the well-known analysis-by-synthesis paradigm has recently been evolved into a new concept. However, in contrast to the original idea stating that the created sound should not fail to pass the foolproof synthesis test, the recent development is a consequence of the need to create new data. Deep learning models are greedy algorithms requiring a vast amount of data that, in addition, should be correctly...

Full text to download in external service

DEVELOPMENT OF THE ALGORITHM OF POLISH LANGUAGE FILM REVIEWS PREPROCESSING

Publication

- Rocznik Naukowy Wydzialu Zarzadzania w Ciechanowie - Year 2017

The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in the Vector Form (Vector Space Model) Procedure; Forming...

Full text available to download

A study on signal processing methods applied to hearing aids

Publication

- Year 2016

This paper presents a short survey on current technology available in hearing aids with a focus on digital signal processing techniques used. First, factors influencing the hearing aid effectiveness are introduced. Then, examples of the present DSP methods and strategies are provided. Also, a description of current limitations of hearing aids and future trends of development are shown. Finally, the notion of computational auditory...

Interactions with recognized objects

Publication

J. Rumiński
A. Bujnowski
J. Wtorek
A. Andrushevich
M. Biallas
R. Kistler

- Year 2014

Implicit interaction combined with object recognition techniques opens a new possibility for gathering data and analyzing user behavior for activity and context recognition. The electronic eyewear platform, eGlasses, is being developed, as an integrated and autonomous system to provide interactions with smart environment. In this paper we present a method for the interactions with the recognized objects that can be used for electronic...

Full text to download in external service

Identification of volatile compounds based on the electrocatalytic gas sensor responses

Publication

P. Kalinowski
Ł. Woźniak
A. Strzelczyk
G. Jasiński

- Microelectronics materials and technologies - Year 2012

Measured response in case of electrocatalytic gas sensors is in form of a voltamperometric characteristic. Current-voltage (I-V) response shape depends on the gas type and its concentration. Such response contains significantly more information comparing with typical electrochemical sensors, but is quite difficult to analyze. When I-V curve contains current peaks, position of such peaks can be used...

Discovering Rule-Based Learning Systems for the Purpose of Music Analysis

Publication

G. Korvel
B. Kostek

- Journal of the Acoustical Society of America - Year 2019

Music analysis and processing aims at understanding information retrieved from music (Music Information Retrieval). For the purpose of music data mining, machine learning (ML) methods or statistical approach are employed. Their primary task is recognition of musical instrument sounds, music genre or emotion contained in music, identification of audio, assessment of audio content, etc. In terms of computational approach, music databases...

Full text available to download

MACHINE LEARNING APPLICATIONS IN RECOGNIZING HUMAN EMOTIONS BASED ON THE EEG

Publication

A. Kastrau
M. Koronowski
M. Liksza
P. Jasik

- Year 2021

This study examined the machine learning-based approach allowing the recognition of human emotional states with the use of EEG signals. After a short introduction to the fundamentals of electroencephalography and neural oscillations, the two-dimensional valence-arousal Russell’s model of emotion was described. Next, we present the assumptions of the performed EEG experiment. Detail aspects of the data sanitization including preprocessing,...

Affect aware video games

Publication

M. Szwoch

- Year 2022

In this chapter a problem of affect aware video games is described, including such issue as: emotional model of the player, design, development and UX testing of affect-aware video games, multimodal emotion recognition and a featured review of affect-aware video games.

Full text to download in external service

MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES

Publication

M. Piotrowska
G. Korvel
B. Kostek
T. Ciszewski
A. Czyżewski

- International Journal of Applied Mathematics and Computer Science - Year 2019

Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’...

Full text available to download

Gesture-based computer control system applied to the interactive whiteboard

Publication

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2010

In the paper the gesture-based computer control system coupled with the dedicated touchless interactive whiteboard is presented. The system engineered enables a user to control any top-most computer application by using one or both hands gestures. First, a review of gesture recognition applications with a focus on methods and algorithms applied is given. Hardware and software solution of the system consisting of a PC, camera, multimedia...

Gesture-based computer control system applied to the interactive whiteboard

Publication

- Year 2010

In the paper the gesture-based computer control system coupled with the dedicated touchless interactive whiteboard is presented. The system engineered enables a user to control any top-most computer application by using one or both hands gestures. First, a review of gesture recognition applications with a focus on methods and algorithms applied is given. Hardware and software solution of the system consisting of a PC, camera, multimedia...

Full text to download in external service

Quality of graphical markers for the needs of eyewear devices

Publication

A. Kwaśniewska
J. Rumiński
J. Klimiuk-Myszk
F. Jérôme
M. Benoît
P. Isabelle

- Year 2015

in this paper we propose to cast the problem of identification of people, objects or places into an application for smart glasses that decodes information from graphical markers. We focus on analyzing different factors that can have influence on the processes of the automatic recognition of information from a code. The research we present aims at reviewing recognition performances in function of: size of a marker, distance from/to...

Full text to download in external service

Affective Learning Manifesto – 10 Years Later

Publication

A. Landowska

- Year 2014

In 2004 a group of affective computing researchers proclaimed a manifesto of affective learning that outlined the prospects and white spots of research at that time. Ten years passed by and affective computing developed many methods and tools for tracking human emotional states as well as models for affective systems construction. There are multiple examples of affective methods applications in Intelligent Tutoring Systems (ITS)....

Zastosowanie metod eksploracji danych do analizy odpowiedzi czujników gazu

Publication

P. Kalinowski

- Year 2018

Zagadnienia poruszane w niniejszej rozprawie dotyczą zastosowania metod eksploracji danych do analizy odpowiedzi czujników gazu, umożliwiających poprawną identyfikację składu mieszaniny gazowej w elektronicznych systemach rozpoznawania gazu. Elektroniczne systemy rozpoznawania gazu to urządzenia wykorzystujące czujniki gazu oraz odpowiednio dobrane metody analizy danych pomiarowych, zdolne do określenia składu mierzonej mieszaniny...

Full text available to download

Robot Eye Perspective in Perceiving Facial Expressions in Interaction with Children with Autism

Publication

A. Landowska
B. Robins

- Advances in Intelligent Systems and Computing - Year 2020

The paper concerns automatic facial expression analysis applied in a study of natural “in the wild” interaction between children with autism and a social robot. The paper reports a study that analyzed the recordings captured via a camera located in the eye of a robot. Children with autism exhibit a diverse level of deficits, including ones in social interaction and emotional expression. The aim of the study was to explore the possibility...

Full text to download in external service

Trustworthy Applications of ML Algorithms in Medicine - Discussion and Preliminary Results for a Problem of Small Vessels Disease Diagnosis.

Publication

M. Ferlin
Z. Klawikowska
J. Niemierko
M. Grzywińska
A. Kwasigroch
E. Szurowska
M. Grochowski

- Year 2022

ML algorithms are very effective tools for medical data analyzing, especially at image recognition. Although they cannot be considered as a stand-alone diagnostic tool, because it is a black-box, it can certainly be a medical support that minimize negative effect of human-factors. In high-risk domains, not only the correct diagnosis is important, but also the reasoning behind it. Therefore, it is important to focus on trustworthiness...

Full text available to download

Features extraction from the electrocatalytic gas sensor responses

Publication

- Year 2016

One of the types of gas sensors used for detection and identification of toxic-air pollutant is an electrocatalytic gas sensor. The electrocatalytic sensors are working in cyclic voltammetry mode, enable detection of various gases. Their response are in the form of I-V curves which contain information about the type and the concentration of measured volatile compound. However,...

Full text to download in external service

Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation

Publication

S. Raczyński
E. Vincent

- IEEE Transactions on Audio Speech and Language Processing - Year 2014

In this work we present a new Bayesian topic model: latent hierarchical Pitman-Yor process allocation (LHPYA), which uses hierarchical Pitman-Yor pr ocess priors for both word and topic distributions, and generalizes a few of the existing topic models, including the latent Dirichlet allocation (LDA), the bi- gram topic model and the hierarchical Pitman-Yor topic model. Using such priors allows for integration of -grams with a topic model,...

Full text to download in external service

Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering

Publication

- IEEE Transactions on Audio Speech and Language Processing - Year 2015

This paper presents a new approach to elimination of impulsive disturbances from stereo audio recordings. The proposed solution is based on vector autoregressive modeling of audio signals. Online tracking of signal model parameters is performed using the exponential ly weighted least squares algo- rithm. Detection of noise pulses an d model-based interpolation of the irrevocably distorted sampl es is realized using an adaptive, variable-order...

Full text available to download

Elimination of Impulsive Disturbances From Archive Audio Signals Using Bidirectional Processing

Publication

- IEEE Transactions on Audio Speech and Language Processing - Year 2013

In this application-oriented paper we consider the problem of elimination of impulsive disturbances, such as clicks, pops and record scratches, from archive audio recordings. The proposed approach is based on bidirectional processing—noise pulses are localized by combining the results of forward-time and backward-time signal analysis. Based on the results of specially designed empirical tests (rather than on the results of theoretical analysis),...

Full text available to download

Dynamic Bayesian Networks for Symbolic Polyphonic Pitch Modeling

Publication

S. Raczyński
E. Vincent
S. Sagayama

- IEEE Transactions on Audio Speech and Language Processing - Year 2013

Symbolic pitch modeling is a way of incorporating knowledge about relations between pitches into the process of an- alyzing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the “horizontal” and the “vertical” pitch struc- ture. These models are formulated as linear or log-linear interpo- lations of up to fi ve sub-models, each of which is...

Full text to download in external service

Search

Filters

Catalog

Category

Year

Options

Search results for: AUDIOVISUAL SPEECH RECOGNITION