Wyniki wyszukiwania dla: audio-video recordings database
-
MODALITY corpus - SPEAKER 35 - COMMANDS C5
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 33 - COMMANDS C4
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 27 - SEQUENCE S3
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 27 - COMMANDS C3
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 33 - SEQUENCE S5
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 27 - SEQUENCE S2
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S1
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S4
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S2
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S5
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S3
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S6
Dane BadawczeThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
ALOFON corpus
Dane BadawczeThe ALOFON corpus is one of the multimodal database of word recordings in English, available at http://www.modality-corpus.org/. The ALOFON corpus is oriented towards the recording of the speech equivalence variants. For this purpose, a total of 7 people who are or speak English with native speaker fluency and a variety of Standard Southern British...
-
Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions
PublikacjaAutomatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...
-
On Facial Expressions and Emotions RGB-D Database
PublikacjaThe goal of this paper is to present the idea of creating reference database of RGB-D video recordings for recognition of facial expressions and emotions. Two different formats of the recordings used for creation of two versions of the database are described and compared using different criteria. Examples of first applications using databases are also presented to evaluate their usefulness.
-
Bimodal deep learning model for subjectively enhanced emotion classification in films
PublikacjaThis research delves into the concept of color grading in film, focusing on how color influences the emotional response of the audience. The study commenced by recalling state-of-the-art works that process audio-video signals and associated emotions by machine learning. Then, assumptions of subjective tests for refining and validating an emotion model for assigning specific emotional labels to selected film excerpts were presented....
-
Video recordings of bees at entrance to hives
Dane BadawczeVideo recordings of bees at entrance to hives from 2017-04-22, 2017-04-23 and 2018-05-22. All recordings were made using hand-held full HD camera (Samsung Galaxy S3) and encoded using H.264 video codec (Standard Baseline Profile for mov files from 2017, High Profile for mp4 files from 2018) , 30 FPS and bit rate 14478 kb/s (mov files from 2017) or 16869 kb/s...
-
KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY
PublikacjaW referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...
-
International Symposium on Audio, Video, Image Processing and Intelligent Applications
Konferencje -
Evaluation Criteria for Affect-Annotated Databases
PublikacjaIn this paper a set of comprehensive evaluation criteria for affect-annotated databases is proposed. These criteria can be used for evaluation of the quality of a database on the stage of its creation as well as for evaluation and comparison of existing databases. The usefulness of these criteria is demonstrated on several databases selected from affect computing domain. The databases contain different kind of data: video or still...
-
Traffic Noise Analysis Applied to Automatic Vehicle Counting and Classification
PublikacjaProblems related to determining traffic noise characteristics are discussed in the context of automatic dynamic noise analysis based on noise level measurements and traffic prediction models. The obtained analytical results provide the second goal of the study, namely automatic vehicle counting and classification. Several traffic prediction models are presented and compared to the results of in-situ noise level measurements. Synchronized...
-
Visual Lip Contour Detection for the Purpose of Speech Recognition
PublikacjaA method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...
-
Cross-domain applications of multimodal human-computer interfaces
PublikacjaDeveloped multimodal interfaces for education applications and for disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and audio interface for speech stretching for hearing impaired and stuttering people and intelligent pen allowing for diagnosing and ameliorating developmental dyslexia. The eye-gaze tracking system named...
-
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders
PublikacjaThe purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods for automatic audio mixing first. Then, a novel deep model based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. The model is trained on a custom-prepared database. Mixes created using the...
-
Network and Operating System Support for Digital Audio and Video (Network and OS Support for Digital A/V)
Konferencje -
Ranking Speech Features for Their Usage in Singing Emotion Classification
PublikacjaThis paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...
-
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
PublikacjaThe problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
-
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
PublikacjaThe problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
-
Elimination of impulsive disturbances from stereo audio recordings
PublikacjaThis paper presents a new approach to elimination of impulsive disturbances from stereo audio recordings. The proposed solution is based on vector autoregressive modeling of audio signals. On-line tracking of signal model parameters is performed using the stability-preserving Whittle-Wiggins-Robinson algorithm with exponential data weighting. Detection of noise pulses and model-based interpolation of the irrevocably distorted samples...
-
System do prototypowania bezprzewodowych inteligentnych urządzeń monitoringu audio-video
PublikacjaW komunikacie przedstawiono system prototypowania bezprzewodowych urządzeń do monitoringu audio-video. System bazuje na układach FPGA Virtex6 i wielu dodatkowych wspierających urządzeniach jak: szybka pamięć DDR3, mała kamera HD, mikrofon z konwerterem A/C, moduł radiowy WiFi, itp. Funkcjonalność systemu została szczegółowo opisana w komunikacie. System został zoptymalizowany do pracy pod kontrolą systemu operacyjnego Linux, zostały...
-
Objectivization of audio-video correlation assessment experiments
PublikacjaThe purpose of this paper is to present a new method of conducting an audio-visual correlation analysis employing a head-motion-free gaze tracking system. First, a review of related works in the domain of sound and vision correlation is presented. Then assumptions concerning audio-visual scene creation are shortly described. The objectivization process of carrying out correlation tests employing gaze-tracking system is outlined....
-
Intelligent video and audio applications for learning enhancement
PublikacjaThe role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
-
Wind Turbines Modeling as the Tool for Developing Algorithms of Processing their Video Recordings
PublikacjaIn the real world, many factors exist disturbing observation of the examined phenomena and causing various noises and distortions in recorded signals. It very often makes it difficult or even impossible to optimize various signal processing algorithms, through finding appropriate parameters. In this paper, we show an application, that retrieves wind turbine rotor speed from recorded video. Next, we describe the process of reduction...
-
Wind Turbines Modeling as the Tool for Developing Algorithms of Processing their Video Recordings
PublikacjaIn the real world, many factors exist disturbing observation of the examined phenomena and causing various noises and distortions in recorded signals. It very often makes it difficult or even impossible to optimize various signal processing algorithms, through finding appropriate parameters. In this paper, we show an application, that retrieves wind turbine rotor speed from recorded video. Next, we describe the process of reduction...
-
RENOVATION OF ARCHIVE AUDIO RECORDINGS USING SPARSE AUTOREGRESSIVE MODELING AND BIDIRECTIONAL PROCESSING
PublikacjaThe paper presents a new approach to elimination of broadband noise and impulsive disturbances from archive audio recordings. The proposed adaptive Kalman-like algorithm, based on a sparse autoregressive model of the audio signal, simultaneously detects noise pulses, interpolates the irrevocably distorted samples and performs signal smoothing. It is shown that bidirectional (forward-backward) processing of the archive signal improves...
-
A study on of music features derived from audio recordings examples – a quantitative analysis
PublikacjaThe paper presents a comparative study of music features derived from audio recordings, i.e. the same music pieces but representing different music genres, excerpts performed by different musicians, and songs performed by a musician, whose style evolved over time. Firstly, the origin and the background of the division of music genres were shortly presented. Then, several objective parameters of an audio signal were recalled that...
-
Comparison of two methods of sound extraction from guitar string video recordings
PublikacjaA comparison of two sound extraction methods from guitar string video recordings is presented in the paper. A brief overview of highframe rate camera technology and possible applications are included. The method using the image analysis from two such cameras is presented. The cameras are placed at the angle of 90 degrees for recording the image in three planes. The results achieved...
-
SYNAT_MUSIC_GENRE_FV_173
Dane BadawczeThis is the original dataset containing 51582 music tracks (22 music genres) and 173 element-feature vector [1-6,9]. A collection of more than 50000 music excerpts described with a set of descriptors obtained through the analysis of 30-second mp3 recordings was gathered in a database called SYNAT. The SYNAT database was realized by the Gdansk University...
-
Quality Analysis of Audio-Video Transmission in an OFDM-Based Communication System
PublikacjaApplication of a reliable audio-video communication system, brings many advantages. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. With the availability of visual information one can monitor the surrounding, working environment, etc. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission. Currently, orthogonal frequency...
-
Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering
PublikacjaThis paper presents a new approach to elimination of impulsive disturbances from stereo audio recordings. The proposed solution is based on vector autoregressive modeling of audio signals. Online tracking of signal model parameters is performed using the exponential ly weighted least squares algo- rithm. Detection of noise pulses an d model-based interpolation of the irrevocably distorted sampl es is realized using an adaptive, variable-order...
-
Improving methods for detecting people in video recordings using shifting time-windows
PublikacjaWe propose a novel method for improving algorithms which detect the presence of people in video sequences. Our focus is on algorithms for applications which require reporting and analyzing all scenes with detected people in long recordings. Therefore one of the target qualities of the classification result is its stability, understood as a low number of invalid scene boundaries. Many existing methods process images in the recording...
-
SYNAT Music Genre Parameters PCA 19
Dane BadawczeThe dataset contains feature vector after Principal Component Analysis (PCA) performing, so there are 11 music genres and 19-element vector derived from music excerpts. Originally, a feature vector containing 173 elements was conceived in earlier research studies carried out by the team of authors [1-6]. A collection of 52532 music excerpts described...
-
SYNAT_PCA_48
Dane BadawczeThere is a series of datasets containing feature vectors derived from music tracks. The dataset contains 51582 music tracks (22 music genres) and feature vector after Principal Component Analysis (PCA) performing, so there are 48-element vectors derived from music excerpts. Originally, a feature vector containing 173 elements was conceived in earlier...
-
SYNAT_PCA_11
Dane BadawczeThe dataset contains 51582 music tracks (22 music genres) and feature vector after Principal Component Analysis (PCA) performing, so there are 11-element vectors derived from music excerpts. Originally, a feature vector containing 173 elements was conceived in earlier research studies carried out by the team of authors [1-6]. A collection of more than...
-
Piotr Szczuko dr hab. inż.
OsobyDr hab. inż. Piotr Szczuko w 2002 roku ukończył studia na Wydziale Elektroniki, Telekomunikacji i Informatyki Politechniki Gdańskiej zdobywając tytuł magistra inżyniera. Tematem pracy dyplomowej było badanie zjawisk jednoczesnej percepcji obrazu cyfrowego i dźwięku dookólnego. W roku 2008 obronił rozprawę doktorską zatytułowaną "Zastosowanie reguł rozmytych w komputerowej animacji postaci", za którą otrzymał nagrodę Prezesa Rady...
-
Analysis of allophones based on audio signal recordings and parameterization
PublikacjaThe aim of this study is to develop an allophonic description of English plosive consonants based on recordings of 600 specially selected words. Allophonic variations addressed in the study may have two sources: positional and contextual. The former one depends on the syllabic or prosodic position in which a particular phoneme occurs. Contextual allophony is conditioned by the local phonetic environment. Co-articulation overlapping...
-
Multimodal human-computer interfaces based on advanced video and audio analysis
PublikacjaMultimodal interfaces development history is reviewed briefly in the introduction. Examples of applications of multimodal interfaces to education software and for the disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and the audio interface for speech stretching for hearing impaired and stuttering people. The Smart...
-
Automatic system for audio-video material reconstruction and archiving
PublikacjaReferat przedstawia propozycję modelu systemu automatycznej archiwizacji i rekonstrukcji nagrań audio-wideo. Założeniem tego rozwiązania jest uczynienie procesu rekonstrukcji nagrań bardziej niezależnym od człowieka. Ma to na celu redukcję kosztów rekonstrukcji przetwarzanych nagrań. Z powodu dużej liczby archiwalnych nagrań audio-wideo istnieje potrzeba stworzenia systemu który umożliwi automatyczną indeksację ich treści. Pomoże...
-
Wireless intelligent audio-video surveillance prototyping system
PublikacjaThe presented system is based on the Virtex6 FPGA and several supporting devices like a fast DDR3 memory, small HD camera, microphone with A/D converter, WiFi radio communication module, etc. The system is controlled by the Linux operating system. The Linux drivers for devices implemented in the system have been prepared. The system has been successfully verified in a H.264 compression accelerator prototype in which the most demanding...
-
Multimodal human-computer interfaces based on advanced video and audio analysis
PublikacjaMultimodal interfaces development history is reviewed briefly in the introduction. Some applications of multimodal interfaces to education software for disabled people are presented. One of them, the LipMouse is a novel, vision-based human-computer interface that tracks user’s lip movements and detect lips gestures. A new approach to diagnosing Parkinson’s disease is also shown. The progression of the disease can be measured employing...