Search results for: audio-visual correlation

Automatic audio signal mixing system based on one-dimensional Wave-U-Net autoencoders

Publication

D. Koszewski

- Year 2023

The purpose of this dissertation is to develop an automatic song mixing system that is capable of automatically mixing a song with good quality in any music genre. This work recalls first the audio signal processing methods used in audio mixing, and it describes selected methods for automatic audio mixing. Then, a novel architecture built based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. Models...

Full text available to download

Testing A Novel Gesture-Based Mixing Interface

Publication

- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Year 2013

With a digital audio workstation, in contrast to the traditional mouse-keyboard computer interface, hand gestures can be used to mix audio with eyes closed. Mixing with a visual representation of audio parameters during experiments led to broadening the panorama and a more intensive use of shelving equalizers. Listening tests proved that the use of hand gestures produces mixes that are aesthetically as good as those obtained using...

Full text available to download

Quality Analysis of Audio-Video Transmission in an OFDM-Based Communication System

Publication

M. Zamłyńska
G. Debita
P. Falkowski-Gilski

- Year 2022

Application of a reliable audio-video communication system, brings many advantages. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. With the availability of visual information one can monitor the surrounding, working environment, etc. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission. Currently, orthogonal frequency...

Full text to download in external service

Surgical tool tracking by on-line selection of structural correlation filters

Publication

- Year 2017

In visual tracking of surgical instruments, correlation filtering finds the best candidate with maximal correlation peak. However, most trackers only consider capturing target appearance but not target structure. In this paper we propose surgical instrument tracking approach that integrates prior knowledge related to rotation of both shaft and tool tips. To this end, we employ rigid parts mixtures model of an instrument. The rigidly...

Full text to download in external service

Methodology and technology for the polymodal allophonic speech transcription

Publication

- Journal of the Acoustical Society of America - Year 2016

A method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...

Full text to download in external service

Methodology and technology for the polymodal allophonic speech transcription

Publication

- Journal of the Acoustical Society of America - Year 2016

A method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...

Full text to download in external service

Automatic sound recognition for security purposes

Publication

P. Żwan

- Year 2008

In the paper an automatic sound recognition system is presented. It forms a part of a bigger security system developed in order to monitor outdoor places for non-typical audio-visual events. The analyzed audio signal is being recorded from a microphone mounted in an outdoor place thus a non stationary noise of a significant energy is present in it. In the paper an especially designed algorithm for outdoor noise reduction is presented,...

Postprodukcja nagrania wideo z dzwiekiem dookolnym

Publication

- Year 2009

One of the aims of this paper is to present issues related to audio-video correlation. This is presented on the basis of a short film realization employing surround microphone techniques. First, some related works in the domain of sound and vision correlation are presented. Then assumptions concerning scene creation related to both audio and video are shortly described. Another objective is to discuss results of subjective tests...

Energy Efficiency Study of Audio-video Content Consumption on Selected Android Mobile Terminals

Publication

- Year 2021

Mobile devices are widely used by billions of users worldwide. Thanks to their main advantage, which is portability, they should be fully operational as long as possible, without the need to recharge or connect them to external power sources. This paper describes a study, carried out on four different mobile devices, with different hardware and software parameters, running the Android operating system. The research campaign involved...

Full text to download in external service

Quality Evaluation of Novel DTD Algorithm Based on Audio Watermarking

Publication

- Year 2011

Echo cancellers typically employ a doubletalk detection (DTD) algorithm in order to keep the adaptive filter from diverging in the presence of near-end speech signal or other disruptive sounds in the microphone signal. A novel doubletalk detection algorithm based on techniques similar to those used for audio signal watermarking was introduced by the authors. The application of the described DTD algorithm within acoustic echo cancellation...

Full text to download in external service

Building Knowledge for the Purpose of Lip Speech Identification

Publication

- Advances in Intelligent Systems and Computing - Year 2017

Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

Full text to download in external service

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

Publication

- Year 2016

Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

Full text to download in external service

Analiza stanu nawierzchni i klas pojazdów na podstawie parametrów ekstrahowanych z sygnału fonicznego

Publication

- Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej - Year 2016

Celem badań jest poszukiwanie parametrów wektora cech ekstrahowanego z sygnału fonicznego w kontekście automatycznego rozpoznawania stanu nawierzchni jezdni oraz typu pojazdów. W pierwszej kolejności przedstawiono wpływ warunków pogodowych na charakterystykę widmową sygnału fonicznego rejestrowanego przy przejeżdżających pojazdach. Następnie, dokonano parametryzacji sygnału fonicznego oraz przeprowadzano analizę korelacyjną w celu...

Full text available to download

Multiple Cues-Based Robust Visual Object Tracking Method

Publication

B. Khan
A. Jalil
A. Ali
K. Alkhaledi
K. Mehmood
K. M. Cheema
M. Murad
H. Tariq
A. M. El-Sherbeeny

- Electronics - Year 2022

Visual object tracking is still considered a challenging task in computer vision research society. The object of interest undergoes significant appearance changes because of illumination variation, deformation, motion blur, background clutter, and occlusion. Kernelized correlation filter- (KCF) based tracking schemes have shown good performance in recent years. The accuracy and robustness of these trackers can be further enhanced...

Full text available to download

A comparative study of English viseme recognition methods and algorithms

Publication

- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2018

An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...

Full text available to download

A comparative study of English viseme recognition methods and algorithm

Publication

- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2018

An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...

Full text available to download

Visual Attention Distribution Based Assessment of User's Skill in Electronic Medical Record Navigation

Publication

T. Kocejko
J. Wtorek
K. Goforth
K. Moidu

- Journal of Medical Imaging and Health Informatics - Year 2015

Currently, the most precise way of reflecting the skills level is an expert’s subjective assessment. In this paper we investigate the possibility of the use of eye tracking data for scalar quantitative and objective assessment of medical staff competency in EMR system navigation. According to the experiment conducted by Yarbus the observation process of particular features is associated with thinking. Moreover, eye tracking is...

Full text to download in external service

Gesture-controlled Sound Mixing System With a Sonified Interface

Publication

- Year 2013

In this paper the Authors present a novel approach to sound mixing. It is materialized in a system that enables to mix sound with hand gestures recognized in a video stream. The system has been developed in such a way that mixing operations can be performed both with or without visual support. To check the hypothesis that the mixing process needs only an auditory display, the influence of audio information visualization on sound...

Full text to download in external service

Multimodal human-computer interfaces based on advanced video and audio analysis

Publication

- Year 2013

Multimodal interfaces development history is reviewed briefly in the introduction. Examples of applications of multimodal interfaces to education software and for the disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and the audio interface for speech stretching for hearing impaired and stuttering people. The Smart...

Full text to download in external service

Filters

Catalog

Automatic audio signal mixing system based on one-dimensional Wave-U-Net autoencoders

Testing A Novel Gesture-Based Mixing Interface

Quality Analysis of Audio-Video Transmission in an OFDM-Based Communication System

Surgical tool tracking by on-line selection of structural correlation filters

Józef Kotus dr hab. inż.

Methodology and technology for the polymodal allophonic speech transcription

Methodology and technology for the polymodal allophonic speech transcription

Automatic sound recognition for security purposes

Postprodukcja nagrania wideo z dzwiekiem dookolnym

Energy Efficiency Study of Audio-video Content Consumption on Selected Android Mobile Terminals

Quality Evaluation of Novel DTD Algorithm Based on Audio Watermarking

Building Knowledge for the Purpose of Lip Speech Identification

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

Analiza stanu nawierzchni i klas pojazdów na podstawie parametrów ekstrahowanych z sygnału fonicznego

Multiple Cues-Based Robust Visual Object Tracking Method

A comparative study of English viseme recognition methods and algorithms

A comparative study of English viseme recognition methods and algorithm

Visual Attention Distribution Based Assessment of User's Skill in Electronic Medical Record Navigation

Gesture-controlled Sound Mixing System With a Sonified Interface

Multimodal human-computer interfaces based on advanced video and audio analysis