Search results for: audio-video recordings database

Search results for: audio-video recordings database

results on page:
embed this view on your website

Filters

total: 322

clear all filters disabled

Video recordings of static hand gestures for gesture based interaction
Open Research Data
open access
- T. Kocejko
This data set contains video recording of selected simple hand gestures related to sign language. The purpose of the data set is to evaluate different computer algorithms design for hand gesture detection as well as for hand features and hand pose detection and identification. The data set contains 5 video recordings in mp4 format. Each recording is...
Energy Efficiency Study of Audio-video Content Consumption on Selected Android Mobile Terminals
Publication
- P. Falkowski-Gilski
- M. Pańkowski
- Year 2021
Mobile devices are widely used by billions of users worldwide. Thanks to their main advantage, which is portability, they should be fully operational as long as possible, without the need to recharge or connect them to external power sources. This paper describes a study, carried out on four different mobile devices, with different hardware and software parameters, running the Android operating system. The research campaign involved...

Full text to download in external service
Piotr Odya dr inż.

People

Department of Multimedia Systems

Piotr Odya was born in Gdansk in 1974. He received his M.Sc. in 1999 from the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Poland. His thesis was related to the problem of sound quality improvement in the contemporary broadcasting studio. He is interested in video editing and multichannel sound systems. The goal of Mr. Odya Ph.D. thesis concerned methods and algorithms for correcting...
Grzegorz Szwoch dr hab. inż.

People

Department of Multimedia Systems

Grzegorz Szwoch was born in 1972 in Gdansk. In 1991-1996 he studied at the Technical University of Gdansk. In 1996 he graduated as a student from the Sound Engineering Department. His thesis was related to physical modeling of musical instruments. Since that time he has been a member of the research staff at the Multimedia Systems Department as a PhD student (1996-2001), Assistant (2001-2004), Assistant professor (2004-2020) and...
QoS/QoE in the Heterogeneous Internet of Things (IoT)
Publication
- K. Nowicki
- T. Uhl
- Year 2017
Applications provided in the Internet of Things can generally be divided into three categories: audio, video and data. This has given rise to the popular term Triple Play Services. The most important audio applications are VoIP and audio streaming. The most notable video applications are VToIP, IPTV, and video streaming, and the service WWW is the most prominent example of data-type services. This chapter elaborates on the most...
Video of LEGO Bricks on Conveyor Belt Dataset Series
Publication
- T. M. Boiński
- Year 2022
The dataset series titled Video of LEGO bricks on conveyor belt is composed of 14 datasets containing video recordings of a moving white conveyor belt. The recordings were created using a smartphone camera in Full HD resolution. The dataset allows for the preparation of data for neural network training, and building of a LEGO sorting machine that can help builders to organise their collections.

Full text available to download
Postprodukcja nagrania wideo z dzwiekiem dookolnym
Publication
- Year 2009
One of the aims of this paper is to present issues related to audio-video correlation. This is presented on the basis of a short film realization employing surround microphone techniques. First, some related works in the domain of sound and vision correlation are presented. Then assumptions concerning scene creation related to both audio and video are shortly described. Another objective is to discuss results of subjective tests...
Recovering Sound Produced by Wind Turbine Structures Employing Video Motion Magnification
Publication
- Year 2019
The recordings were made with a fast video camera and with a microphone. Using fast cameras allowed for observation of the micro vibrations of the object structure. Motion-magnified video recordings of wind turbines on a wind farm were made for the purpose of building a damage prediction system. An idea was to use video to recover sound & vibrations in order to obtain a contactless diagnostic method for wind turbines. The recovered signals...

Full text to download in external service
Reduction of parasitic pitch variations in archival musical recordings
Publication
- SIGNAL PROCESSING - Year 2010
A new method for reducing parasitic pitch variations in archival audio recordings is presented. The method is intended for analyzing movie soundtracks recorded in optical films. It utilizes image processing for calculating and reducing effects of tape shrinkage being one of the main reasons for parasitic pitch variations in audio accompanying moving images. As long as the film tape characteristics are known the new method can be...

Full text available to download
The American Sign Language alphabet
Open Research Data
open access
- S. Olewniczak
- K. Witczak
- I. Czartowski
- H. Wołek
The American Sign Language dataset contains all static letters of the American alphabet, meaning those that do not require movement to perform (the entire alphabet except for the letters 'J' and 'Z', which are dynamic and require hand movement).
Video analytics-based algorithm for monitoring egress from buildings
Publication
- M. Szczodrak
- A. Czyżewski
- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2016
A concept and a practical implementation of the algorithm for detecting of potentially dangerous situations related to crowding in passages is presented. An example of such a situation is a crush which may be caused by an obstructed pedestrian pathway. The surveillance video camera signal analysis performed in the online mode is employed in order to detect hold-ups near bottlenecks like doorways or staircases. The details of the...

Full text available to download
Video Analytics-Based Algorithm for Monitoring Egress from Buildings
Publication
- M. Szczodrak
- A. Czyżewski
- Year 2013
A concept and practical implementation of the algorithm for detecting of potentially dangerous situations of crowding in passages is presented. An example of such situation is a crush which may be caused by obstructed pedestrian pathway. Surveillance video camera signal analysis performed on line is employed in order to detect hold-ups near bottlenecks like doorways or staircases. The details of implemented algorithm which uses...

Full text to download in external service
Localization of impulsive disturbances in audio signals using template matching
Publication
- M. Niedźwiecki
- M. Ciołek
- DIGITAL SIGNAL PROCESSING - Year 2015
In this paper, a new solution to the problem of elimination of impulsive disturbances from audio signals, based on the matched filtering technique, is proposed. The new approach stems from the observation that a large proportion of noise pulses corrupting audio recordings have highly repetitive shapes that match several typical “patterns”. In many cases a representative set of exemplary pulse waveforms can be extracted from the...

Full text available to download
Elimination of Impulsive Disturbances From Archive Audio Signals Using Bidirectional Processing
Publication
- M. Niedźwiecki
- M. Ciołek
- IEEE Transactions on Audio Speech and Language Processing - Year 2013
In this application-oriented paper we consider the problem of elimination of impulsive disturbances, such as clicks, pops and record scratches, from archive audio recordings. The proposed approach is based on bidirectional processing—noise pulses are localized by combining the results of forward-time and backward-time signal analysis. Based on the results of specially designed empirical tests (rather than on the results of theoretical analysis),...

Full text available to download
An extension to the FEEDB Multimodal Database of Facial Expressions and Emotions
Publication
- M. Szwoch
- L. Marco-gimenez
- M. Arevalillo-herráez
- A. Ayesh
- Year 2015
FEEDB is a multimodal database that contains recordings of people expressing different emotions, captured by using a Microsoft Kinect sensor. Data were originally provided in the device’s proprietary format (XED), requiring both the Microsoft Kinect Studio application and a Kinect sensor attached to the system to use the files. In this paper, we present an extension of the database. For a selection of recordings, we also provide...

Full text to download in external service
Building Knowledge for the Purpose of Lip Speech Identification
Publication
- Advances in Intelligent Systems and Computing - Year 2017
Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

Full text to download in external service
FEEDB: A multimodal database of facial expressions and emotions
Publication
- M. Szwoch
- Year 2013
In this paper a first version of a multimodal FEEDB database of facial expressions and emotions is presented. The database contains labeled RGB-D recordings of people expressing a specific set of expressions that have been recorded using Microsoft Kinect sensor. Such a database can be used for classifier training and testing in face recognition as well as in recognition of facial expressions and human emotions. Also initial experiences...

Full text to download in external service
Sparse vector autoregressive modeling of audio signals and its application to the elimination of impulsive disturbances
Publication
- M. Niedźwiecki
- M. Ciołek
- Year 2015
Archive audio files are often corrupted by impulsive disturbances, such as clicks, pops and record scratches. This paper presents a new method for elimination of impulsive disturbances from stereo audio signals. The proposed approach is based on a sparse vector autoregressive signal model, made up of two components: one taking care of short-term signal correlations, and the other one taking care of long-term correlations. The method...

Full text to download in external service
Online sound restoration system for digital library applications
Publication
- Year 2013
Audio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...

Full text to download in external service
1D convolutional context-aware architectures for acoustic sensing and recognition of passing vehicle type
Publication
- Year 2020
A network architecture that may be employed to sensing and recognition of a type of vehicle on the basis of audio recordings made in the proximity of a road is proposed in the paper. The analyzed road traffic consists of both passenger cars and heavier vehicles. Excerpts from recordings that do not contain vehicles passing sounds are also taken into account and marked as ones containing silence....
Detection of impulsive disturbances in archive audio signals
Publication
- M. Ciołek
- M. Niedźwiecki
- Year 2017
In this paper the problem of detection of impulsive disturbances in archive audio signals is considered. It is shown that semi-causal/noncausal solutions based on joint evaluation of signal prediction errors and leave-one-out signal interpolation errors, allow one to noticeably improve detection results compared to the prediction-only based solutions. The proposed approaches are evaluated on a set of clean audio signals contaminated...

Full text available to download
Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing
Publication
- D. Koszewski
- B. Kostek
- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Year 2020
Developing signal processing methods to extract information automatically has potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile applications (e.g., tuning apps), or pre-processing for an automatic mixing system. However, the last-mentioned application needs a significant amount of research to reliably recognize real musical instruments in recordings....

Full text available to download
Online sound restoration system for digital library applications.
Publication
- Journal of the Acoustical Society of America - Year 2013
Audio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
Intelligent multimedia solutions supporting special education needs.
Publication
- A. Czyżewski
- B. Kostek
- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011
The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
Exploiting audio-visual correlation by means of gaze tracking
Publication
- B. Kunka
- B. Kostek
- International Journal of Computer Science and Applications - Year 2010
This paper presents a novel means for increasing audio-visual correlation analysis reliability. This is done based on gaze tracking technology engineered at the Multimedia Systems Department of the Gdansk University of Technology, Poland. In the paper, the past history and current research in the area of audio-visual perception analysis are shortly reviewed. Then the methodology employing gaze tracking is presented along with the...

Full text available to download
Rozproszone przechowywanie zapasowych kopii danych
Publication
- J. Kuchta
- Year 2012
Pokazano metodę wykorzystania systemu przetwarzania rozproszonego do zabezpieczenia instytucji przed skutkami ataku hakerskiego połączonego ze zniszczeniem bazy danych tej instytucji. Metoda ta polega na wplataniu pakietów danych do materiałów audio-video ściąganych przez internautów korzystających z serwisów filmowych Video-on-Demand i przechowywaniu danych w rozproszeniu na setki lub nawet tysiące komputerów.

Full text to download in external service
Creating a Realible Music Discovery and Recomendation System
Publication
- Year 2014
The aim of this paper is to show problems related to creating a reliable music dis-covery system. The SYNAT database that contains audio files is used for the purpose of experiments. The files are divided into 22 classes corresponding to music genres with different cardinality. Of utmost importance for a reliable music recommendation system are the assignment of audio files to their appropriate gen-res and optimum parameterization...

Full text to download in external service
Data obtained via parametrization of differently mixed audio signals
Open Research Data
open access
- J. Stefański
- K. Marciniuk
Dataset consists of audio samples and the results of their parametrization. The extraction of music parameters was performed using MIRToolbox. Information extracted from the samples was used as a database for master's thesis titled 'The influence of audio signal processing chain in mixing on the emotional state of a music piece'.
Further Developments of the Online Sound Restoration System for Digital Library Applications
Publication
- Year 2014
New signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...

Full text to download in external service
Polish motorways 2016 - video data
Open Research Data
open access
- Ł. Jeliński
- W. Kustra
- D. Bytner
- series: Polish roads 2016- video data
Polish motorways 2016 - video data
Moving object detection and tracking for the purpose of multimodal surveillance system in urban areas
Publication
- A. Czyżewski
- P. Dalka
- Year 2008
Background subtraction method based on mixture of Gaussians was employed to detect all regions in a video frame denoting moving objects. Kalman filters were used for establishing relations between the regions and real moving objects in a scene and for tracking them continuously. The objects were represented by rectangles. The objects coupling with adequate regions including the relation of many-to-many was studied experimentally...
Gaze-tracking based audio-visual correlation analysis employing quality of experience methodology
Publication
- Intelligent Decision Technologies-Netherlands - Year 2010
This paper investigates a new approach to audio-visual correlation assessment based on the gaze-tracking system developed at the Multimedia Systems Department (MSD) of Gdansk University of Technology (GUT). The gaze-tracking methodology, having roots in Human-Computer Interaction borrows the relevance feedback through gaze-tracking and applies it to the new area of interests, which is Quality of Experience. Results of subjective...

Full text to download in external service
Methodology and technology for the polymodal allophonic speech transcription
Publication
- Journal of the Acoustical Society of America - Year 2016
A method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...

Full text to download in external service
Methodology and technology for the polymodal allophonic speech transcription
Publication
- Journal of the Acoustical Society of America - Year 2016
A method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...

Full text to download in external service
Polish expressways 2016 - video data
Open Research Data
open access
- Ł. Jeliński
- W. Kustra
- D. Bytner
- series: Polish roads 2016- video data
Polish expressways 2016 - video data
Eye Blink Based Detection of Liveness in Biometric Authentication Systems Using Conditional Random Fields
Publication
- M. Szwoch
- P. Pieniążek
- Year 2012
The goal of this paper was to verify whether the conditional random fields are suitable and enough efficient for eye blink detection in user authentication systems based on face recognition with a standard web camera. To evaluate this approach several experiments were carried on using a specially developed test application and video database.
Localization of impulsive disturbances in archive audio signals using predictive matched filtering
Publication
- M. Niedźwiecki
- M. Ciołek
- Year 2014
The problem of elimination of impulsive disturbances from archive audio signals is considered and its new solution, called predictive matched filtering, is proposed. The new approach is based on the observation that a large percentage of noise pulses corrupting archive audio recordings have highly repetitive shapes that match several typical “patterns”, called click templates. To localize noise pulses, click templates can be correlated...

Full text to download in external service
Measurements of OF QoS/QoE parameters for media streaming in a PMIPv6 TESTBED WITH 802.11 b/g/n WLANs
Publication
- Metrology and Measurement Systems - Year 2012
A growing number of mobile devices and the increasing popularity of multimedia services result in a new challenge of providing mobility in access networks. The paper describes experimental research on media (audio and video) streaming in a mobile IEEE 802.11 b/g/n environment realizing network-based mobility. It is an approach to mobility that requires little or no modification of the mobile terminal. Assessment of relevant parameters...

Full text available to download
Polish voivodeship roads 2016 - video data
Open Research Data
open access
- Ł. Jeliński
- W. Kustra
- D. Bytner
- series: Polish roads 2016- video data
Polish voivodeship roads 2016 - video data
An new method of audio-visual correlation analysis
Publication
- B. Kunka
- B. Kostek
- Year 2009
This paper presents a new methodology of conducting the audio-visual correlation analysis employing the gaze tracking system. Interaction between two perceptual modalities, seeing and hearing, their interaction and mutual reinforcement in a complex relationship was a subject of many research studies. Earlier stage of the carried out experiments at the Multimedia Systems Department (MSD) showed that there exists a relationship between...

Full text to download in external service
Determining Pronunciation Differences in English Allophones Utilizing Audio Signal Parameterization
Publication
- B. Kostek
- M. Piotrowska
- T. Ciszewski
- A. Czyżewski
- Year 2017
An allophonic description of English plosive consonants, based on audio-visual recordings of 600 specially selected words, was developed. First, several speakers were recorded while reading words from a teleprompter. Then, every word was played back from the previously recorded sample read by a phonology expert and each examined speaker repeated a particular word trying to imitate correct pronunciation. The next step consisted...
In uence of Low-Level Features Extracted from Rhythmic and Harmonic Sections on Music Genre Classi cation
Publication
- A. Rosner
- F. Weninger
- B. Schuller
- M. Michalak
- B. Kostek
- Year 2013
We present a comprehensive evaluation of the infuence of 'harmonic' and rhythmic sections contained in an audio file on automatic music genre classi cation. The study is performed using the ISMIS database composed of music files, which are represented by vectors of acoustic parameters describing low-level music features. Non-negative Matrix Factorization serves for blind separation of instrument components. Rhythmic components...
Sparse autoregressive modeling
Publication
- M. Ciołek
- Year 2012
In the paper the comparison of the popular pitch determination (PD) algorithms for thepurpose of elimination of clicks from archive audio signals using sparse autoregressive (SAR)modeling is presented. The SAR signal representation has been widely used in code-excitedlinear prediction (CELP) systems. The appropriate construction of the SAR model is requiredto guarantee model stability. For this reason the signal representation...
Audio content analysis in the urban area telemonitoring system
Publication
- Year 2010
Artykuł przedstawia możliwości rozwinięcie monitoringu miejskiego o automatyczną analizę dźwięku. Przedstawiono metody parametryzacji dźwięku, które możliwe są do zastosowania w takim systemie oraz omówiono aspekty techniczne implementacji. W kolejnej części przedstawiono system decyzyjny oparty na drzewach zastosowany w systemie. System ten rozpoznaje dźwięki niebezpieczne (strzał, rozbita szyba, krzyk) wśród dźwięków zarejestrowanych...

Full text to download in external service
Selection of Features for Multimodal Vocalic Segments Classification
Publication
- S. Zaporowski
- A. Czyżewski
- Year 2018
English speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the inﬂuence of feature vector dimension reduction for the accuracy of vocalic segments classiﬁcation employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...

Full text to download in external service
Automatic audio signal mixing system based on one-dimensional Wave-U-Net autoencoders
Publication
- D. Koszewski
- Year 2023
The purpose of this dissertation is to develop an automatic song mixing system that is capable of automatically mixing a song with good quality in any music genre. This work recalls first the audio signal processing methods used in audio mixing, and it describes selected methods for automatic audio mixing. Then, a novel architecture built based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. Models...

Full text available to download
Polish national roads 2016 - video data
Open Research Data
open access
- Ł. Jeliński
- W. Kustra
- D. Bytner
- series: Polish roads 2016- video data
The data includes video traffic data registered with video camera installed inside the car. The purpose of the research was to gather vehicle traffic recordings in real conditions on polish national roads.
Visualization of events using various kinds of synchronized data for the Border Guard
Publication
- B. Czaplewski
- S. Kaczmarek
- J. A. Litka
- M. Miszewski
- Zeszyty Naukowe Akademii Marynarki Wojennej - Year 2017
STRADAR project is dedicated to streaming real-time data in a distributed dispatcher and teleinfor-mation system of the Border Guard. The Events Visualization Post is a software designed for simultaneous visualization of data of different types in BG headquarters. The software allows the operator to visualize files, images, SMS, SDS, video, audio, and current or archival data on naval situation on digital maps. All the visualized...

Full text available to download
Production of six-degrees-of-freedom (6DoF) navigable audio using 30 Ambisonic microphones
Publication
- B. Mróz
- M. Kabaciński
- T. Ciotucha
- A. Rumiński
- T. Żernicki
- Year 2021
This paper describes a method for planning, recording, and post-production of six-degrees-of-freedom audio recorded with multiple 3rd order Ambisonic microphone arrays. The description is based on the example of recordings conducted in August 2020 with the Poznan Philharmonic Orchestra using 30 units of Zylia ZM-1S. A convenient way to prepare and organize such a big project is proposed – this involves details of stage planning,...

Full text to download in external service
Zaawansowane Przetwarzanie Sygnału
e-Learning Courses
- A. Szewczyk
- J. Smulko
Przedmiot prezentuje wybrane metody przetwarzania sygnałów w bardzo szerokim obszarze zastosowań. Ilustruje najnowsze osiągnięcia w tym zakresie, wsparte wybranymi publikacjami. Zajęcia są podzielone na wykład (15 h) i seminarium (15 h). Podstawowe pojęcia dotyczące cyfrowego przetwarzania sygnałów, zalecana literatura Analiza widmowa gęstość widmowa mocy, widmo falkowe, polispektra i gęstość widmowa mocy skrośnej Efekty...

Search

Filters

Catalog

Search results for: audio-video recordings database

Piotr Odya dr inż.

Grzegorz Szwoch dr hab. inż.