Search results for: audio-visual correlation - Bridge of Knowledge

Search

Search results for: audio-visual correlation

Search results for: audio-visual correlation

  • Automatic audio signal mixing system based on one-dimensional Wave-U-Net autoencoders

    Publication

    - Year 2023

    The purpose of this dissertation is to develop an automatic song mixing system that is capable of automatically mixing a song with good quality in any music genre. This work recalls first the audio signal processing methods used in audio mixing, and it describes selected methods for automatic audio mixing. Then, a novel architecture built based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. Models...

    Full text available to download

  • Testing A Novel Gesture-Based Mixing Interface

    With a digital audio workstation, in contrast to the traditional mouse-keyboard computer interface, hand gestures can be used to mix audio with eyes closed. Mixing with a visual representation of audio parameters during experiments led to broadening the panorama and a more intensive use of shelving equalizers. Listening tests proved that the use of hand gestures produces mixes that are aesthetically as good as those obtained using...

    Full text available to download

  • Quality Analysis of Audio-Video Transmission in an OFDM-Based Communication System

    Publication

    - Year 2022

    Application of a reliable audio-video communication system, brings many advantages. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. With the availability of visual information one can monitor the surrounding, working environment, etc. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission. Currently, orthogonal frequency...

    Full text to download in external service

  • Surgical tool tracking by on-line selection of structural correlation filters

    Publication

    In visual tracking of surgical instruments, correlation filtering finds the best candidate with maximal correlation peak. However, most trackers only consider capturing target appearance but not target structure. In this paper we propose surgical instrument tracking approach that integrates prior knowledge related to rotation of both shaft and tool tips. To this end, we employ rigid parts mixtures model of an instrument. The rigidly...

    Full text to download in external service

  • Józef Kotus dr hab. inż.

  • Methodology and technology for the polymodal allophonic speech transcription

    A method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...

    Full text to download in external service

  • Methodology and technology for the polymodal allophonic speech transcription

    A method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...

    Full text to download in external service

  • Automatic sound recognition for security purposes

    Publication

    - Year 2008

    In the paper an automatic sound recognition system is presented. It forms a part of a bigger security system developed in order to monitor outdoor places for non-typical audio-visual events. The analyzed audio signal is being recorded from a microphone mounted in an outdoor place thus a non stationary noise of a significant energy is present in it. In the paper an especially designed algorithm for outdoor noise reduction is presented,...

  • Postprodukcja nagrania wideo z dzwiekiem dookolnym

    Publication

    One of the aims of this paper is to present issues related to audio-video correlation. This is presented on the basis of a short film realization employing surround microphone techniques. First, some related works in the domain of sound and vision correlation are presented. Then assumptions concerning scene creation related to both audio and video are shortly described. Another objective is to discuss results of subjective tests...

  • Energy Efficiency Study of Audio-video Content Consumption on Selected Android Mobile Terminals

    Publication

    Mobile devices are widely used by billions of users worldwide. Thanks to their main advantage, which is portability, they should be fully operational as long as possible, without the need to recharge or connect them to external power sources. This paper describes a study, carried out on four different mobile devices, with different hardware and software parameters, running the Android operating system. The research campaign involved...

    Full text to download in external service

  • Quality Evaluation of Novel DTD Algorithm Based on Audio Watermarking

    Publication

    Echo cancellers typically employ a doubletalk detection (DTD) algorithm in order to keep the adaptive filter from diverging in the presence of near-end speech signal or other disruptive sounds in the microphone signal. A novel doubletalk detection algorithm based on techniques similar to those used for audio signal watermarking was introduced by the authors. The application of the described DTD algorithm within acoustic echo cancellation...

    Full text to download in external service

  • Building Knowledge for the Purpose of Lip Speech Identification

    Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

    Full text to download in external service

  • Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

    Publication

    Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

    Full text to download in external service

  • Analiza stanu nawierzchni i klas pojazdów na podstawie parametrów ekstrahowanych z sygnału fonicznego

    Celem badań jest poszukiwanie parametrów wektora cech ekstrahowanego z sygnału fonicznego w kontekście automatycznego rozpoznawania stanu nawierzchni jezdni oraz typu pojazdów. W pierwszej kolejności przedstawiono wpływ warunków pogodowych na charakterystykę widmową sygnału fonicznego rejestrowanego przy przejeżdżających pojazdach. Następnie, dokonano parametryzacji sygnału fonicznego oraz przeprowadzano analizę korelacyjną w celu...

    Full text available to download

  • Multiple Cues-Based Robust Visual Object Tracking Method

    Publication
    • B. Khan
    • A. Jalil
    • A. Ali
    • K. Alkhaledi
    • K. Mehmood
    • K. M. Cheema
    • M. Murad
    • H. Tariq
    • A. M. El-Sherbeeny

    - Electronics - Year 2022

    Visual object tracking is still considered a challenging task in computer vision research society. The object of interest undergoes significant appearance changes because of illumination variation, deformation, motion blur, background clutter, and occlusion. Kernelized correlation filter- (KCF) based tracking schemes have shown good performance in recent years. The accuracy and robustness of these trackers can be further enhanced...

    Full text available to download

  • A comparative study of English viseme recognition methods and algorithms

    An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...

    Full text available to download

  • A comparative study of English viseme recognition methods and algorithm

    An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...

    Full text available to download

  • Visual Attention Distribution Based Assessment of User's Skill in Electronic Medical Record Navigation

    Publication

    Currently, the most precise way of reflecting the skills level is an expert’s subjective assessment. In this paper we investigate the possibility of the use of eye tracking data for scalar quantitative and objective assessment of medical staff competency in EMR system navigation. According to the experiment conducted by Yarbus the observation process of particular features is associated with thinking. Moreover, eye tracking is...

    Full text to download in external service

  • Gesture-controlled Sound Mixing System With a Sonified Interface

    Publication

    - Year 2013

    In this paper the Authors present a novel approach to sound mixing. It is materialized in a system that enables to mix sound with hand gestures recognized in a video stream. The system has been developed in such a way that mixing operations can be performed both with or without visual support. To check the hypothesis that the mixing process needs only an auditory display, the influence of audio information visualization on sound...

    Full text to download in external service

  • Multimodal human-computer interfaces based on advanced video and audio analysis

    Multimodal interfaces development history is reviewed briefly in the introduction. Examples of applications of multimodal interfaces to education software and for the disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and the audio interface for speech stretching for hearing impaired and stuttering people. The Smart...

    Full text to download in external service