Filtry
wszystkich: 1498
-
Katalog
wyświetlamy 1000 najlepszych wyników Pomoc
Wyniki wyszukiwania dla: AUTOMATIC SPEECH RECOGNITION
-
Multimodal English corpus for automatic speech recognition
PublikacjaA multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...
-
An audio-visual corpus for multimodal automatic speech recognition
Publikacjareview of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...
-
Comparison of Language Models Trained on Written Texts and Speech Transcripts in the Context of Automatic Speech Recognition
Publikacja -
A survey of automatic speech recognition deep models performance for Polish medical terms
PublikacjaAmong the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....
-
Language Models in Speech Recognition
PublikacjaThis chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.
-
Automatic sound recognition for security purposes
PublikacjaIn the paper an automatic sound recognition system is presented. It forms a part of a bigger security system developed in order to monitor outdoor places for non-typical audio-visual events. The analyzed audio signal is being recorded from a microphone mounted in an outdoor place thus a non stationary noise of a significant energy is present in it. In the paper an especially designed algorithm for outdoor noise reduction is presented,...
-
System for automatic singing voice recognition
PublikacjaW artykule przedstawiono system automatycznego rozpoznawania jakości i typu głosu śpiewaczego. Przedstawiono bazę danych oraz zaimplementowane parametry. Algorytmem decyzyjnym jest algorytm sztucznych sieci neuronowych. Wytrenowany system decyzyjny osiąga skuteczność ok. 90% w obydwu kategoriach rozpoznawania. Dodatkowo wykazano przy pomocy metod statystycznych, że wyniki działania systemu automatycznej oceny jakości technicznej...
-
Speech recognition system for hearing impaired people.
PublikacjaPraca przedstawia wyniki badań z zakresu rozpoznawania mowy. Tworzony system wykorzystujący dane wizualne i akustyczne będzie ułatwiał trening poprawnego mówienia dla osób po operacji transplantacji ślimaka i innych osób wykazujących poważne uszkodzenia słuchu. Active Shape models zostały wykorzystane do wyznaczania parametrów wizualnych na podstawie analizy kształtu i ruchu ust w nagraniach wideo. Parametry akustyczne bazują na...
-
Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning
PublikacjaText-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...
-
Examining Influence of Distance to Microphone on Accuracy of Speech Recognition
PublikacjaThe problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...
-
Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions
PublikacjaAutomatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...
-
Visual Lip Contour Detection for the Purpose of Speech Recognition
PublikacjaA method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...
-
Audiovisual speech recognition for training hearing impaired patients
PublikacjaPraca przedstawia system rozpoznawania izolowanych głosek mowy wykorzystujący dane wizualne i akustyczne. Modele Active Shape Models zostały wykorzystane do wyznaczania parametrów wizualnych na podstawie analizy kształtu i ruchu ust w nagraniach wideo. Parametry akustyczne bazują na współczynnikach melcepstralnych. Sieć neuronowa została użyta do rozpoznawania wymawianych głosek na podstawie wektora cech zawierającego oba typy...
-
IEEE Automatic Speech Recognition and Understanding Workshop
Konferencje -
Automatic recognition of therapy progress among children with autism
PublikacjaThe article presents a research study on recognizing therapy progress among children with autism spectrum disorder. The progress is recognized on the basis of behavioural data gathered via five specially designed tablet games. Over 180 distinct parameters are calculated on the basis of raw data delivered via the game flow and tablet sensors - i.e. touch screen, accelerometer and gyroscope. The results obtained confirm the possibility...
-
Automatic prosodic modification in a Text-To-Speech synthesizer of Polish language
PublikacjaPrzedstawiono system syntezy mowy polskiej z funkcją automatycznej modyfikacji prozodii wypowiedzi. Opisane zostały metody automatycznego wyznaczania akcentu i intonacji wypowiedzi. Przedstawiono zastosowanie algorytmów przetwarzania sygnału mowy w procesie kształtowania prozodii. Omówiono wpływ zastosowanych modyfikacji na naturalność brzmienia syntezowanego sygnału. Zastosowana metoda oparta jest na algorytmie TD-PSOLA. Opracowany...
-
Automatic recognition of the arterial input function in MRI studies
PublikacjaArtykuł prezentuje opis automatycznej metody detekcji tętniczej funkcji wejście (AIF). Metoda została porównana z klinicznie pomierzonymi seriami obrazów DSC-MRI.
-
Automatic Emotion Recognition in Children with Autism: A Systematic Literature Review
PublikacjaThe automatic emotion recognition domain brings new methods and technologies that might be used to enhance therapy of children with autism. The paper aims at the exploration of methods and tools used to recognize emotions in children. It presents a literature review study that was performed using a systematic approach and PRISMA methodology for reporting quantitative and qualitative results. Diverse observation channels and modalities...
-
Auditory-model based robust feature selection for speech recognition
Publikacja -
Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition
PublikacjaThe problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...
-
Automatic Singing Voice Recognition EmployingNeural Networks and Rough Sets
PublikacjaCelem badań jest automatyczne rozpoznawanie głosów śpiewaczych w kategorii rodzaju i jakości technicznej śpiewu. W artykule opisano stworzoną bazę danych głosów, która zawiera próbki głosu śpiewaków profesjonalnych i amatorskich. W dalszej części opisano parametry zdefiniowane w oparciu o zjawiska biomechaniczne w narządzie głosu podczas śpiewania. W oparciu o stworzone macierze parametrów wytrenowano i porównano automatyczne klasyfikatory...
-
ISCA Tutorial and Research Workshop Automatic Speech Recognition
Konferencje -
Automatic singing quality recognition employing artificial neural networks
PublikacjaCelem artykułu jest udowodnienie możliwości automatycznej oceny jakości technicznej głosów śpiewaczych. Pokrótce zaprezentowano w nim stworzoną bazę danych głosów śpiewaczych oraz zaimplementowane parametry. Przy pomocy sztucznych sieci neuronowych zaprojektowano system decyzyjny, który oceniono w pięciostopniowej skali jakość techniczną głosu. Przy pomocy metod statystycznych udowodniono, że wyniki generowane przez ten system...
-
Preliminary Study on Automatic Recognition of Spatial Expressions in Polish Texts
Publikacja -
Soft computing based automatic recognition of musical instrument classes.
PublikacjaW artykule przedstawiono wyniki eksperymentów dotyczących automatycznego rozpoznawania klas instrumentów muzycznych. Proces klasyfikacji zrealizowano w oparciu o sztuczne sieci neuronowe, zaś wektor cch został oparty o parametry obliczane w wyniku analizy falkowej dźwięków instrumentów muzycznych.
-
Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets
PublikacjaArtificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...
-
Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition
Publikacjaconvolutional neural network (CNN) which is a class of deep, feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word...
-
Verification of the Parameterization Methods in the Context of Automatic Recognition of Sounds Related to Danger
PublikacjaW artykule opisano aplikację, która automatycznie wykrywa zdarzenia dźwiękowe takie jak: rozbita szyba, wystrzał, wybuch i krzyk. Opisany system składa się z bloku parametryzacji i klasyfikatora. W artykule dokonano porównania parametrów dedykowanych dla tego zastosowania oraz standardowych deskryptorów MPEG-7. Porównano też dwa klasyfikatory: Jeden oparty o Percetron (sieci neuronowe) i drugi oparty o Maszynę wektorów wspierających....
-
Automatic singing voice recognition employing neural networks and rough sets
PublikacjaCelem prac opisanych w referacie jest automatyczne rozpoznawanie głosów śpiewaczych. Do tego celu utworzona została baza nagrań próbek śpiewu profesjonalnego i amatorskiego. Próbki poddane zostały parametryzacji parametrami zaproponowanymi przez autorów ściśle do tego celu. Sposób wyznaczenia parametrów i ich interpretacja fizyczna przedstawione są w referacie. Parametry wprowadzane są do systemów decyzyjnych, klasyfikatorów opartych...
-
A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
PublikacjaIn this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...
-
Combining visual and acoustic modalities to ease speech recognition by hearing impaired people
PublikacjaArtykuł prezentuje system, którego celem działania jest ułatwienie procesu treningu poprawnej wymowy dla osób z poważnymi wadami słuchu. W analizie mowy wykorzystane zostały parametry akutyczne i wizualne. Do wyznaczenia parametrów wizualnych na podstawie kształtu i ruchu ust zostały wykorzystane modele Active Shape Models. Parametry akustyczne bazują na współczynnikach melcepstralnych. Do klasyfikacji wypowiadanych głosek została...
-
A Concept of Automatic Film Color Grading Based on Music Recognition and Evoked Emotions
PublikacjaThe article presents the aspects of the final selection of the color of shots in film production based on the psychology of color. First of all, the elements of color processing, contrast, saturation or white balance in the film shots were presented and the definition of color grading was given. In the second part of the article the analysis of film music was conducted in the context of stimulating appropriate emotions while watching...
-
Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems
PublikacjaThe aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...
-
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
PublikacjaThe problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
-
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
PublikacjaThe problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
-
Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition
PublikacjaBrain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....
-
Automatic recognition of males and females among web browser users based on behavioural patterns of peripherals usage
PublikacjaPurpose The purpose of this paper is to answer the question whether it is possible to recognise the gender of a web browser user on the basis of keystroke dynamics and mouse movements. Design/methodology/approach An experiment was organised in order to track mouse and keyboard usage using a special web browser plug-in. After collecting the data, a number of parameters describing the users’ keystrokes, mouse movements and clicks...
-
Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning
PublikacjaThe Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...
-
Automatic Watercraft Recognition and Identification on Water Areas Covered by Video Monitoring as Extension for Sea and River Traffic Supervision Systems
PublikacjaThe article presents the watercraft recognition and identification system as an extension for the presently used visual water area monitoring systems, such as VTS (Vessel Traffic Service) or RIS (River Information Service). The watercraft identification systems (AIS - Automatic Identification Systems) which are presently used in both sea and inland navigation require purchase and installation of relatively expensive transceivers...
-
Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej
PublikacjaThe bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...
-
IEEE International Conference on Automatic Face and Gesture Recognition
Konferencje -
Artur Gańcza mgr inż.
OsobyI received the M.Sc. degree from the Gdańsk University of Technology (GUT), Gdańsk, Poland, in 2019. I am currently a Ph.D. student at GUT, with the Department of Automatic Control, Faculty of Electronics, Telecommunications and Informatics. My professional interests include speech recognition, system identification, adaptive signal processing and linear algebra.
-
Andrzej Czyżewski prof. dr hab. inż.
OsobyProf. zw. dr hab. inż. Andrzej Czyżewski jest absolwentem Wydziału Elektroniki PG (studia magisterskie ukończył w 1982 r.). Pracę doktorską na temat związany z dźwiękiem cyfrowym obronił z wyróżnieniem na Wydziale Elektroniki PG w roku 1987. W 1992 r. przedstawił rozprawę habilitacyjną pt.: „Cyfrowe operacje na sygnałach fonicznych”. Jego kolokwium habilitacyjne zostało przyjęte jednomyślnie w czerwcu 1992 r. w Akademii Górniczo-Hutniczej...
-
Investigating Feature Spaces for Isolated Word Recognition
PublikacjaMuch attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...
-
Methodology and technology for the polymodal allophonic speech transcription
PublikacjaA method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...
-
Methodology and technology for the polymodal allophonic speech transcription
PublikacjaA method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...
-
Speech Analytics Based on Machine Learning
PublikacjaIn this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...
-
Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor
PublikacjaSpatial filtration of sound is introduced to enhance speech recognition accuracy in noisy conditions. An acoustic vector sensor (AVS) is employed. The signals from the AVS probe are processed in order to attenuate the surrounding noise. As a result the signal to noise ratio is increased. An experiment is featured in which speech signals are disturbed by babble noise. The signals before and after spatial filtration are processed...
-
Bimodal classification of English allophones employing acoustic speech signal and facial motion capture
PublikacjaA method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...
-
Vocalic Segments Classification Assisted by Mouth Motion Capture
PublikacjaVisual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested...
-
The Innovative Faculty for Innovative Technologies
PublikacjaA leaflet describing Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology. Multimedia Systems Department described laboratories and prototypes of: Auditory-visual attention stimulator, Automatic video event detection, Object re-identification application for multi-camera surveillance systems, Object Tracking and Automatic Master-Slave PTZ Camera Positioning System, Passive Acoustic Radar,...
-
Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling
PublikacjaA common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result...
-
Bożena Kostek prof. dr hab. inż.
Osoby -
Marking the Allophones Boundaries Based on the DTW Algorithm
PublikacjaThe paper presents an approach to marking the boundaries of allophones in the speech signal based on the Dynamic Time Warping (DTW) algorithm. Setting and marking of allophones boundaries in continuous speech is a difficult issue due to the mutual influence of adjacent phonemes on each other. It is this neighborhood on the one hand that creates variants of phonemes that is allophones, and on the other hand it affects that the border...
-
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
PublikacjaIn the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...
-
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
PublikacjaIn the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...
-
Integration in Multichannel Emotion Recognition
PublikacjaThe paper concerns integration of results provided by automatic emotion recognition algorithms. It presents both the challenges and the approaches to solve them. Paper shows experimental results of integration. The paper might be of interest to researchers and practitioners who deal with automatic emotion recognition and use more than one solution or multichannel observation.
-
Jan Daciuk dr hab. inż.
OsobyJan Daciuk uzyskał tytuł zawodowy magistra na Wydziale Elektroniki Politechniki Gdańskiej w 1986 roku, a doktorat na wydziale Elektroniki, Telekomunikacji i Informatyki PG w 1999. Pracuje na Wydziale od 1988 roku. Jego zainteresowania naukowe obejmują zastosowania automatów skończonych w przetwarzaniu języka naturalnego i przetwarzaniu mowy. Spędził ponad cztery lata w europejskich uniwersytetach i instytutach naukowych, takich...
-
Limitations of Emotion Recognition from Facial Expressions in e-Learning Context
PublikacjaThe paper concerns technology of automatic emotion recognition applied in e-learning environment. During a study of e-learning process the authors applied facial expressions observation via multiple video cameras. Preliminary analysis of the facial expressions using automatic emotion recognition tools revealed several unexpected results, including unavailability of recognition due to face coverage and significant inconsistency...
-
Applying the Lombard Effect to Speech-in-Noise Communication
PublikacjaThis study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...
-
Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically
PublikacjaThe aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...
-
Investigating Feature Spaces for Isolated Word Recognition
PublikacjaThe study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...
-
Emotion Recognition for Affect Aware Video Games
PublikacjaIn this paper the idea of affect aware video games is presented. A brief review of automatic multimodal affect recognition of facial expressions and emotions is given. The first result of emotions recognition using depth data as well as prototype affect aware video game are presented
-
An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics
PublikacjaThe speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...
-
Introduction to the special issue on machine learning in acoustics
PublikacjaWhen we started our Call for Papers for a Special Issue on “Machine Learning in Acoustics” in the Journal of the Acoustical Society of America, our ambition was to invite papers in which machine learning was applied to all acoustics areas. They were listed, but not limited to, as follows: • Music and synthesis analysis • Music sentiment analysis • Music perception • Intelligent music recognition • Musical source separation • Singing...
-
Automatic Classification of Polish Sign Language Words
PublikacjaIn the article we present the approach to automatic recognition of hand gestures using eGlove device. We present the research results of the system for detection and classification of static and dynamic words of Polish language. The results indicate the usage of eGlove allows to gain good recognition quality that additionally can be improved using additional data sources such as RGB cameras.
-
Automatic music set organizatio based on mood of music / Automatyczna organizacja bazy muzycznej na podstawie nastroju muzyki
PublikacjaThis work is focused on an approach based on the emotional content of music and its automatic recognition. A vector of features describing emotional content of music was proposed. Additionally, a graphical model dedicated to the subjective evaluation of mood of music was created. A series of listening tests was carried out, and results were compared with automatic mood recognition employing SOM (Self Organizing Maps) and ANN (Artificial...
-
Orken Mamyrbayev Professor
Osoby1. Education: Higher. In 2001, graduated from the Abay Almaty State University (now Abay Kazakh National Pedagogical University), in the specialty: Computer science and computerization manager. 2. Academic degree: Ph.D. in the specialty "6D070300-Information systems". The dissertation was defended in 2014 on the topic: "Kazakh soileulerin tanudyn kupmodaldy zhuyesin kuru". Under my supervision, 16 masters, 1 dissertation...
-
SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM
PublikacjaThe main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...
-
Limitations of Emotion Recognition in Software User Experience Evaluation Context
PublikacjaThis paper concerns how an affective-behavioural- cognitive approach applies to the evaluation of the software user experience. Although it may seem that affect recognition solutions are accurate in determining the user experience, there are several challenges in practice. This paper aims to explore the limitations of the automatic affect recognition applied in the usability context as well as...
-
Examining Feature Vector for Phoneme Recognition
PublikacjaThe aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...
-
Guido: a musical score recognition system
PublikacjaThis paper presents an optical music recognition system Guido that can automatically recognize the main musical symbols of music scores that were scanned or taken by a digital camera. The application is based on object model of musical notation and uses linguistic approach for symbol interpretation and error correction. The system offers musical editor with a partially automatic error correction.
-
Mining inconsistent emotion recognition results with the multidimensional model
PublikacjaThe paper deals with the challenge of inconsistency in multichannel emotion recognition. The focus of the paper is to explore factors that might influence the inconsistency. The paper reports an experiment that used multi-camera facial expression analysis with multiple recognition systems. The data were analyzed using a multidimensional approach and data mining techniques. The study allowed us to explore camera location, occlusions...
-
Building Knowledge for the Purpose of Lip Speech Identification
PublikacjaConsecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...
-
Uncertainty in emotion recognition
PublikacjaPurpose–The purpose of this paper is to explore uncertainty inherent in emotion recognition technologiesand the consequences resulting from that phenomenon.Design/methodology/approach–The paper is a general overview of the concept; however, it is basedon a meta-analysis of multiple experimental and observational studies performed over the past couple of years.Findings–The mainfinding of the paper might be summarized as follows:...
-
KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY
PublikacjaW referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...
-
WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE
PublikacjaW niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...
-
A comparative study of English viseme recognition methods and algorithms
PublikacjaAn elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...
-
A comparative study of English viseme recognition methods and algorithm
PublikacjaAn elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...
-
Rough Sets Applied to Mood of Music Recognition
PublikacjaWith the growth of accessible digital music libraries over the past decade, there is a need for research into automated systems for searching, organizing and recommending music. Mood of music is considered as one of the most intuitive criteria for listeners, thus this work is focused on the emotional content of music and its automatic recognition. The research study presented in this work contains an attempt to music emotion recognition...
-
Parameters optimization in medicine supporting image recognition algorithms
PublikacjaIn this paper, a procedure of automatic set up of image recognition algorithms' parameters is proposed, for the purpose of reducing the time needed for algorithms' development. The procedure is presented on two medicine supporting algorithms, performing bleeding detection in endoscopic images. Since the algorithms contain multiple parameters which must be specified, empirical testing is usually required to optimise the algorithm's...
-
Voice command recognition using hybrid genetic algorithm
PublikacjaAbstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...
-
Noise profiling for speech enhancement employing machine learning models
PublikacjaThis paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...
-
Recognition of hazardous acoustic events employing parallel processing on a supercomputing cluster . Rozpoznawanie niebezpiecznych zdarzeń dźwiękowych z wykorzystaniem równoległego przetwarzania na klastrze superkomputerowym
PublikacjaA method for automatic recognition of hazardous acoustic events operating on a super computing cluster is introduced. The methods employed for detecting and classifying the acoustic events are outlined. The evaluation of the recognition engine is provided: both on the training set and using real-life signals. The algorithms yield sufficient performance in practical conditions to be employed in security surveillance systems. The...
-
Examining Feature Vector for Phoneme Recognition / Analiza parametrów w kontekście automatycznej klasyfikacji fonemów
PublikacjaThe aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...
-
Selection of Features for Multimodal Vocalic Segments Classification
PublikacjaEnglish speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the influence of feature vector dimension reduction for the accuracy of vocalic segments classification employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...
-
Biometria i przetwarzanie mowy 2023
Kursy Online{mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...
-
Biometria i przetwarzanie mowy 2024
Kursy Online{mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...
-
PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS
PublikacjaThe quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...
-
Endoscopic Video Classification with the Consideration of Temporal Patterns
PublikacjaThe article describes a novel approach to automatic recognition and classification of diseases in endoscopic videos. Current directions of research in this field are discussed. Most presented methods focus on processing single frames and do not take into consideration the temporal relationship between continuous classifications. Existing approaches that consider the temporal structure of an incoming frame sequence are focused on...
-
Using MusicXML to evaluate accuracy of OMR Systems
PublikacjaIn this paper a methodology for automatic accuracy evaluation in optical music recognition (OMR) applications is proposed. Presented approach assumes using ground truth images together with digital music scores describing their content. The automatic evaluation algorithm measures differences between the tested score and the reference one, both stored in MusicXML format. Some preliminary test results of this approach are presented...
-
Further developments of parameterization methods of audio stream analysis for secuirty purposes
PublikacjaThe paper presents an automatic sound recognition algorithm intended for application in an audiovisual security monitoring system. A distributed character of security systems does not allow for simultaneous observation of multiple multimedia streams, thus an automatic recognition algorithm must be introduced. In the paper, a module for the parameterization and automatic detection of audio events is described. The spectral analyses...
-
Deep neural networks for data analysis
Kursy OnlineThe aim of the course is to familiarize students with the methods of deep learning for advanced data analysis. Typical areas of application of these types of methods include: image classification, speech recognition and natural language understanding. Celem przedmiotu jest zapoznanie studentów z metodami głębokiego uczenia maszynowego na potrzeby zaawansowanej analizy danych. Do typowych obszarów zastosowań tego typu metod należą:...
-
MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES
PublikacjaAutomatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’...
-
The Influence of Selecting Regions from Endoscopic Video Frames on The Efficiency of Large Bowel Disease Recognition Algorithms
PublikacjaThe article presents our research in the field of the automatic diagnosis of large intestine diseases on endoscopic video. It focuses on the methods of selecting regions of interest from endoscopic video frames for further analysis by specialized disease recognition algorithms. Four methods of selecting regions of interest have been discussed: a. trivial, b. with the deletion of characteristic, endoscope specific additions to the...
-
Robot Eye Perspective in Perceiving Facial Expressions in Interaction with Children with Autism
PublikacjaThe paper concerns automatic facial expression analysis applied in a study of natural “in the wild” interaction between children with autism and a social robot. The paper reports a study that analyzed the recordings captured via a camera located in the eye of a robot. Children with autism exhibit a diverse level of deficits, including ones in social interaction and emotional expression. The aim of the study was to explore the possibility...
-
Trustworthy Applications of ML Algorithms in Medicine - Discussion and Preliminary Results for a Problem of Small Vessels Disease Diagnosis.
PublikacjaML algorithms are very effective tools for medical data analyzing, especially at image recognition. Although they cannot be considered as a stand-alone diagnostic tool, because it is a black-box, it can certainly be a medical support that minimize negative effect of human-factors. In high-risk domains, not only the correct diagnosis is important, but also the reasoning behind it. Therefore, it is important to focus on trustworthiness...
-
Towards Emotion Acquisition in IT Usability Evaluation Context
PublikacjaThe paper concerns extension of IT usability studies with automatic analysis of the emotional state of a user. Affect recognition methods and emotion representation models are reviewed and evaluated for applicability in usability testing procedures. Accuracy of emotion recognition, susceptibility to disturbances, independence on human will and interference with usability testing procedures are...
-
Potential and Use of the Googlenet Ann for the Purposes of Inland Water Ships Classification
PublikacjaThis article presents an analysis of the possibilities of using the pre-degraded GoogLeNet artificial neural network to classify inland vessels. Inland water authorities monitor the intensity of the vessels via CCTV. Such classification seems to be an improvement in their statutory tasks. The automatic classification of the inland vessels from video recording is a one of the main objectives of the Automatic Ship Recognition and...
-
Classifying type of vehicles on the basis of data extracted from audio signal characteristics
PublikacjaThe aim of this study is to find and optimize a feature vector for an automatic recognition of the type of vehicles, extracted form an audio signal. First, the influence of weather-based conditions of road surface on spectral characteristic of the audio signal recorded from a passing vehicle in close proximity to the road is discussed. Next, parameterization of the recorded audio signal is performed. For that purpose, the MIRtoolbox,...