Filtry
wszystkich: 556
Wyniki wyszukiwania dla: AUDIO-VISUAL SPEECH RECOGNITION SYSTEM
-
Józef Kotus dr hab. inż.
Osoby -
Face recognition by humans with gaze-tracking system Cyber-Eye
PublikacjaW celu dokładniejszego zrozumienia sposobu rozpoznawania i zapamiętywania twarzy przez człowieka przeprowadzono doświadczenie na grupie 20 osób z wykorzystaniem wcześniej opracowanego systemu śledzenia fiksacji wzroku Cyber-Oko [3]. Wykorzystując diody i kamerę podczerwieni wraz z dedykowanym oprogramowaniem Cyber-Oko, które pozwala na śledzenie punktu skupienia wzroku na ekranie. Każdej osobie biorącej udział w doświadczeniu pokazano...
-
EURASIP Journal on Audio Speech and Music Processing
Czasopisma -
System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych = System of speech signal processing and visualisation of the results
PublikacjaW artykule przedstawiono sposób przetwarzania i wizualizacji sygnału mowy w formie prostego w obsłudze i relatywnie niedrogiego urządzenia do nagrywania sygnału akustycznego oraz przetwarzania cyfrowego wyselekcjonowanych fragmentów i wizualizacji uzyskanych rezultatów przekształceń. Zastosowano do tego celu komputer z kartą dźwiękową. Przetwarzanie cyfrowe oraz wizualizacja dokonywana była w oparciu o program MATLAB bezpośrednio...
-
System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych [A system of speech signal processing and visualisation for linguistic purposes]
Publikacja -
KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY
PublikacjaW referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...
-
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders
PublikacjaThe purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods for automatic audio mixing first. Then, a novel deep model based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. The model is trained on a custom-prepared database. Mixes created using the...
-
Piotr Odya dr inż.
OsobyPiotr Odya urodził się w Gdańsku w 1974. W 1999 roku ukończył z wyróżnieniem studia na Wydziale Elektroniki, Telekomunikacji i Informatyki Politechniki Gdańskiej zdobywając tytuł magistra inżyniera. Praca dyplomowa dotyczyła problemów poprawy jakości dźwięku w studiach emisyjnych współczesnych rozgłośni radiowych.Jego zainteresowania dotyczą montażu wideofonicznego, systemów dźwięku wielokanałowego. W ramach studiów doktoranckich...
-
IEEE Automatic Speech Recognition and Understanding Workshop
Konferencje -
Jan Daciuk dr hab. inż.
OsobyJan Daciuk uzyskał tytuł zawodowy magistra na Wydziale Elektroniki Politechniki Gdańskiej w 1986 roku, a doktorat na wydziale Elektroniki, Telekomunikacji i Informatyki PG w 1999. Pracuje na Wydziale od 1988 roku. Jego zainteresowania naukowe obejmują zastosowania automatów skończonych w przetwarzaniu języka naturalnego i przetwarzaniu mowy. Spędził ponad cztery lata w europejskich uniwersytetach i instytutach naukowych, takich...
-
Objectivization of audio-video correlation assessment experiments
PublikacjaThe purpose of this paper is to present a new method of conducting an audio-visual correlation analysis employing a head-motion-free gaze tracking system. First, a review of related works in the domain of sound and vision correlation is presented. Then assumptions concerning audio-visual scene creation are shortly described. The objectivization process of carrying out correlation tests employing gaze-tracking system is outlined....
-
ISCA Tutorial and Research Workshop Automatic Speech Recognition
Konferencje -
Adam Kupryjanow mgr inż.
Osoby -
Introduction to the special issue on machine learning in acoustics
PublikacjaWhen we started our Call for Papers for a Special Issue on “Machine Learning in Acoustics” in the Journal of the Acoustical Society of America, our ambition was to invite papers in which machine learning was applied to all acoustics areas. They were listed, but not limited to, as follows: • Music and synthesis analysis • Music sentiment analysis • Music perception • Intelligent music recognition • Musical source separation • Singing...
-
Marcin Kulawiak dr hab. inż.
Osoby -
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention
PublikacjaThis paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...
-
Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor
PublikacjaSpatial filtration of sound is introduced to enhance speech recognition accuracy in noisy conditions. An acoustic vector sensor (AVS) is employed. The signals from the AVS probe are processed in order to attenuate the surrounding noise. As a result the signal to noise ratio is increased. An experiment is featured in which speech signals are disturbed by babble noise. The signals before and after spatial filtration are processed...
-
SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM
PublikacjaThe main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...
-
Investigating Feature Spaces for Isolated Word Recognition
PublikacjaMuch attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...
-
Gesture-controlled Sound Mixing System With a Sonified Interface
PublikacjaIn this paper the Authors present a novel approach to sound mixing. It is materialized in a system that enables to mix sound with hand gestures recognized in a video stream. The system has been developed in such a way that mixing operations can be performed both with or without visual support. To check the hypothesis that the mixing process needs only an auditory display, the influence of audio information visualization on sound...
-
An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics
PublikacjaThe speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...
-
WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE
PublikacjaW niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...
-
Intelligent multimedia solutions supporting special education needs.
PublikacjaThe role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
-
Intelligent video and audio applications for learning enhancement
PublikacjaThe role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
-
Marking the Allophones Boundaries Based on the DTW Algorithm
PublikacjaThe paper presents an approach to marking the boundaries of allophones in the speech signal based on the Dynamic Time Warping (DTW) algorithm. Setting and marking of allophones boundaries in continuous speech is a difficult issue due to the mutual influence of adjacent phonemes on each other. It is this neighborhood on the one hand that creates variants of phonemes that is allophones, and on the other hand it affects that the border...
-
Creating a Realible Music Discovery and Recomendation System
PublikacjaThe aim of this paper is to show problems related to creating a reliable music dis-covery system. The SYNAT database that contains audio files is used for the purpose of experiments. The files are divided into 22 classes corresponding to music genres with different cardinality. Of utmost importance for a reliable music recommendation system are the assignment of audio files to their appropriate gen-res and optimum parameterization...
-
Classifying type of vehicles on the basis of data extracted from audio signal characteristics
PublikacjaThe aim of this study is to find and optimize a feature vector for an automatic recognition of the type of vehicles, extracted form an audio signal. First, the influence of weather-based conditions of road surface on spectral characteristic of the audio signal recorded from a passing vehicle in close proximity to the road is discussed. Next, parameterization of the recorded audio signal is performed. For that purpose, the MIRtoolbox,...
-
Grzegorz Szwoch dr hab. inż.
OsobyGrzegorz Szwoch urodził się w 1972 roku w Gdańsku. W latach 1991-1996 studiował na wydziale Elektroniki Politechniki Gdańskiej. W roku 1996 ukończył studia w Zakładzie Inżynierii Dźwięku (obecnie Katedra Systemów Multimedialnych), broniąc pracę dyplomową pt. Modelowanie fizyczne wybranych instrumentów muzycznych. W tym samym roku dołączył do zespołu badawczego Katedry jako uczestnik Studium Doktoranckiego. Od stycznia 2001 roku...
-
Network and Operating System Support for Digital Audio and Video (Network and OS Support for Digital A/V)
Konferencje -
Investigating Feature Spaces for Isolated Word Recognition
PublikacjaThe study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...
-
Testing A Novel Gesture-Based Mixing Interface
PublikacjaWith a digital audio workstation, in contrast to the traditional mouse-keyboard computer interface, hand gestures can be used to mix audio with eyes closed. Mixing with a visual representation of audio parameters during experiments led to broadening the panorama and a more intensive use of shelving equalizers. Listening tests proved that the use of hand gestures produces mixes that are aesthetically as good as those obtained using...
-
Voice command recognition using hybrid genetic algorithm
PublikacjaAbstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...
-
Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions
PublikacjaThe aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...
-
Biometria i przetwarzanie mowy 2023
Kursy Online{mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...
-
Biometria i przetwarzanie mowy 2024
Kursy Online{mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...
-
Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning
PublikacjaThe Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...
-
Further developments of parameterization methods of audio stream analysis for secuirty purposes
PublikacjaThe paper presents an automatic sound recognition algorithm intended for application in an audiovisual security monitoring system. A distributed character of security systems does not allow for simultaneous observation of multiple multimedia streams, thus an automatic recognition algorithm must be introduced. In the paper, a module for the parameterization and automatic detection of audio events is described. The spectral analyses...
-
Auditory-visual attention stimulator
PublikacjaNew approach to lateralization irregularities formation was proposed. The emphasis is put on the relationship between visual and auditory attention stimulation. In this approach hearing is stimulated using time scale modified speech and sight is stimulated by rendering the text of the currently heard speech. Moreover, displayed text is modified using several techniques i.e. zooming, highlighting etc. In the experimental part of...
-
Analiza stanu nawierzchni i klas pojazdów na podstawie parametrów ekstrahowanych z sygnału fonicznego
PublikacjaCelem badań jest poszukiwanie parametrów wektora cech ekstrahowanego z sygnału fonicznego w kontekście automatycznego rozpoznawania stanu nawierzchni jezdni oraz typu pojazdów. W pierwszej kolejności przedstawiono wpływ warunków pogodowych na charakterystykę widmową sygnału fonicznego rejestrowanego przy przejeżdżających pojazdach. Następnie, dokonano parametryzacji sygnału fonicznego oraz przeprowadzano analizę korelacyjną w celu...
-
Magdalena Szuflita-Żurawska
OsobyMagdalena Szuflita-Żurawska jest kierownikiem Sekcji Informacji Naukowo-Technicznej na Politechnice Gdańskiej oraz Liderem Centrum Kompetencji Otwartej Nauki przy Bibliotece Politechniki Gdańskiej. Jej główne zainteresowania badawcze koncentrują się w obszarze komunikacji naukowej oraz otwartych danych badawczych, a także motywacji i produktywności naukowej. Jest odpowiedzialna między innymi za prowadzenie szkoleń dla pracowników...
-
Wykorzystanie systemu komputerowego ALEP-PL w planowaniu rozwoju lokalnych systemów energetycznych
PublikacjaZaprezentowano autorski system komputerowy ALEP-PL, który wspomaga proces planowania rozwoju lokalnych systemów energetycznych. Narzędzie zostało przygotowane z uwzględnieniem metodyki planowania zaawansowanego. System składa się z serwisu internetowego, bazy danych i modułów logiki biznesowej. Serwis internetowy został stworzony w technologii ASP.NET z użyciem środowiska Visual Studio 2010 i serwera baz danych MS SQL Server 2008...
-
Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing
PublikacjaDeveloping signal processing methods to extract information automatically has potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile applications (e.g., tuning apps), or pre-processing for an automatic mixing system. However, the last-mentioned application needs a significant amount of research to reliably recognize real musical instruments in recordings....
-
Bimodal Emotion Recognition Based on Vocal and Facial Features
PublikacjaEmotion recognition is a crucial aspect of human communication, with applications in fields such as psychology, education, and healthcare. Identifying emotions accurately is challenging, as people use a variety of signals to express and perceive emotions. In this study, we address the problem of multimodal emotion recognition using both audio and video signals, to develop a robust and reliable system that can recognize emotions...
-
Application of gaze tracking technology to quality of experience domain
PublikacjaA new methodological approach to study subjective assessment results employing gaze tracking technology is shown. Notions of Human-Computer Interaction (HCI) and Quality of Experience (QoE) are shortly introduced in the context of their common application. Then, the gaze tracking system developed at the Multimedia Systems Department (MSD) of Gdansk University of Technology (GUT) is presented. A series of audio-visual subjective...
-
Krzysztof Goczyła prof. dr hab. inż.
OsobyKrzysztof Goczyła, profesor zwyczajny Politechniki Gdańskiej, informatyk, specjalista z inżynierii oprogramowania, inżynierii wiedzy i baz danych. Ukończył studia wyższe na Wydziale Elektroniki Politechniki Gdańskiej w 1976 r. jako magister inżynier elektronik w specjalności automatyka. Na Politechnice Gdańskiej pracuje od 1976. Na Wydziale Elektroniki PG w 1982 r. uzyskał doktorat z informatyki, a w 1999 r. habilitację. W 2012...
-
Ranking Speech Features for Their Usage in Singing Emotion Classification
PublikacjaThis paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...
-
Acceleration of decision making in sound event recognition employing supercomputing cluster
PublikacjaParallel processing of audio data streams is introduced to shorten the decision making time in hazardous sound event recognition. A supercomputing cluster environment with a framework dedicated to processing multimedia data streams in real time is used. The sound event recognition algorithms employed are based on detecting foreground events, calculating their features in short time frames, and classifying the events with Support...
-
Automatic Watercraft Recognition and Identification on Water Areas Covered by Video Monitoring as Extension for Sea and River Traffic Supervision Systems
PublikacjaThe article presents the watercraft recognition and identification system as an extension for the presently used visual water area monitoring systems, such as VTS (Vessel Traffic Service) or RIS (River Information Service). The watercraft identification systems (AIS - Automatic Identification Systems) which are presently used in both sea and inland navigation require purchase and installation of relatively expensive transceivers...
-
Multimodal system for diagnosis and polysensory stimulation of subjects with communication disorders
PublikacjaAn experimental multimodal system, designed for polysensory diagnosis and stimulation of persons with impaired communication skills or even non-communicative subjects is presented. The user interface includes an eye tracking device and the EEG monitoring of the subject. Furthermore, the system consists of a device for objective hearing testing and an autostereoscopic projection system designed to stimulate subjects through their...
-
Variable Ratio Sample Rate Conversion Based on Fractional Delay Filter
PublikacjaIn this paper a sample rate conversion algorithm which allows for continuously changing resampling ratio has been presented. The proposed implementation is based on a variable fractional delay filter which is implemented by means of a Farrow structure. Coefficients of this structure are computed on the basis of fractional delay filters which are designed using the offset window method. The proposed approach allows us to freely...