Wyniki wyszukiwania dla: audio

Wyniki wyszukiwania dla: audio

wyników na stronę:
osadź ten widok na swojej stronie

Filtry

wszystkich: 458

wyczyść wszystkie filtry niedostępne

Bimodal deep learning model for subjectively enhanced emotion classification in films
Publikacja
- D. Weber
- B. Kostek
- INFORMATION SCIENCES - Rok 2024
This research delves into the concept of color grading in film, focusing on how color influences the emotional response of the audience. The study commenced by recalling state-of-the-art works that process audio-video signals and associated emotions by machine learning. Then, assumptions of subjective tests for refining and validating an emotion model for assigning specific emotional labels to selected film excerpts were presented....

Pełny tekst do pobrania w serwisie zewnętrznym
Wow defect reduction based on interpolation techniques
Publikacja
- P. Maziewski
- Rok 2005
W referacie przedstawiono wyniki badania różnych technik interpolacji wykorzystanych w redukcji kołysania dźwięku. W badaniach użyto: interpolację liniową, dwie techniki interpolacji wielomianowej (Hermite i spline), i technikę sumowania okienkowanych funkcji sink. Jakość rekonstrukcji wykonano wykorzystując sztucznie spreparowany sygnał audio, rekonstruowany wymienionymi metodami interpolacji. Jakość rekonstrukcji oceniono wykorzystując...
Creating a Realible Music Discovery and Recomendation System
Publikacja
- Rok 2014
The aim of this paper is to show problems related to creating a reliable music dis-covery system. The SYNAT database that contains audio files is used for the purpose of experiments. The files are divided into 22 classes corresponding to music genres with different cardinality. Of utmost importance for a reliable music recommendation system are the assignment of audio files to their appropriate gen-res and optimum parameterization...

Pełny tekst do pobrania w serwisie zewnętrznym
Transmitting Alarm Information in DAB+ Broadcasting System
Publikacja
- P. Falkowski-Gilski
- Rok 2018
The main goal of digital broadcasting is to deliver high-quality content with the lowest possible bitrate. This paper is focused on transmitting alarm information, such as emergency warning and alerting, in the DAB+ (Digital Audio Broadcasting plus) broadcasting system. These additional services should be available at the lowest possible bitrate, in order to provide a clear and understandable voice message to people. Furthermore, additional...
In uence of Low-Level Features Extracted from Rhythmic and Harmonic Sections on Music Genre Classi cation
Publikacja
- A. Rosner
- F. Weninger
- B. Schuller
- M. Michalak
- B. Kostek
- Rok 2013
We present a comprehensive evaluation of the infuence of 'harmonic' and rhythmic sections contained in an audio file on automatic music genre classi cation. The study is performed using the ISMIS database composed of music files, which are represented by vectors of acoustic parameters describing low-level music features. Non-negative Matrix Factorization serves for blind separation of instrument components. Rhythmic components...
EVENTS VISUALIZATION POST IN A DISTRIBUTED TELEINFORMATION SYSTEM FOR THE BORDER GUARD
Publikacja
- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Rok 2017
Events Visualization Post is a part of the STRADAR project, which is dedicated to streaming real-time data in distributed dispatcher and teleinformation systems of the Border Guard. Events Visualization Post is a software designed for simultaneous visualization of data of different types. In the paper, the structure of the software is presented, the process of generation of tasks is described, and the visualization of audio, files,...
Online sound restoration system for digital library applications.
Publikacja
- Journal of the Acoustical Society of America - Rok 2013
Audio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
Reduction of parasitic pitch variations in archival musical recordings
Publikacja
- SIGNAL PROCESSING - Rok 2010
A new method for reducing parasitic pitch variations in archival audio recordings is presented. The method is intended for analyzing movie soundtracks recorded in optical films. It utilizes image processing for calculating and reducing effects of tape shrinkage being one of the main reasons for parasitic pitch variations in audio accompanying moving images. As long as the film tape characteristics are known the new method can be...

Pełny tekst do pobrania w portalu
Building Knowledge for the Purpose of Lip Speech Identification
Publikacja
- Advances in Intelligent Systems and Computing - Rok 2017
Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

Pełny tekst do pobrania w serwisie zewnętrznym
Fitting the mobile device characteristics to the user's hearing preferences
Publikacja
- Rok 2014
A method for fitting the mobile computer audio characteristics to the user's hearing preferences is proposed. The process consists of two stages: calibration and dynamics processing. During the calibration phase the user performs a loudness scaling test giving their response regarding the perceived loudness. The dynamics processing made on above basis sets the loudness to the most comfortable level. The processing accounts both...

Pełny tekst do pobrania w serwisie zewnętrznym
Data, Information, Knowledge, Wisdom Pyramid Concept Revisited in the Context of Deep Learning
Publikacja
- B. Kostek
- Rok 2023
In this paper, the data, information, knowledge, and wisdom (DIKW) pyramid is revisited in the context of deep learning applied to machine learningbased audio signal processing. A discussion on the DIKW schema is carried out, resulting in a proposal that may supplement the original concept. Parallels between DIWK pertaining to audio processing are presented based on examples of the case studies performed by the author and her collaborators....

Pełny tekst do pobrania w serwisie zewnętrznym
Postprodukcja nagrania wideo z dzwiekiem dookolnym
Publikacja
- Rok 2009
One of the aims of this paper is to present issues related to audio-video correlation. This is presented on the basis of a short film realization employing surround microphone techniques. First, some related works in the domain of sound and vision correlation are presented. Then assumptions concerning scene creation related to both audio and video are shortly described. Another objective is to discuss results of subjective tests...
1D convolutional context-aware architectures for acoustic sensing and recognition of passing vehicle type
Publikacja
- Rok 2020
A network architecture that may be employed to sensing and recognition of a type of vehicle on the basis of audio recordings made in the proximity of a road is proposed in the paper. The analyzed road traffic consists of both passenger cars and heavier vehicles. Excerpts from recordings that do not contain vehicles passing sounds are also taken into account and marked as ones containing silence....
Evaluation of a Novel Approach to Virtual Bass Synthesis Strategy
Publikacja
- P. Hoffmann
- B. Kostek
- Rok 2015
The aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) strategy applied to portable computers. The developed algorithms involve intelligent, rule-based settings of bass synthesis parameters with regard to music genre of an audio excerpt and the type of a portable device in use. The Smart VBS algorithm performs the synthesis based on a nonlinear device (NLD) with artificial controlling synthesis...

Pełny tekst do pobrania w serwisie zewnętrznym
Classification of Music Genres by Means of Listening Tests and Decision Algorithms
Publikacja
- Rok 2018
The paper compares the results of audio excerpt assignment to a music genre obtained in listening tests and classification by means of decision algorithms. A short review on music description employing music styles and genres is given. Then, assumptions of listening tests to be carried out along with an online survey for assigning audio samples to selected music genres are presented. A framework for music parametrization is created...

Pełny tekst do pobrania w serwisie zewnętrznym
Machine learning applied to acoustic-based road traffic monitoring
Publikacja
- K. Marciniuk
- B. Kostek
- Procedia Computer Science - Rok 2022
The motivation behind this study lies in adapting acoustic noise monitoring systems for road traffic monitoring for driver’s safety. Such a system should recognize a vehicle type and weather-related pavement conditions based on the audio level measurement. The study presents the effectiveness of the selected machine learning algorithms in acoustic-based road traffic monitoring. Bases of the operation of the acoustic road traffic...

Pełny tekst do pobrania w portalu
Music genre classification applied to bass enhancement for mobile technology
Publikacja
- P. Hoffmann
- B. Kostek
- Elektronika : konstrukcje, technologie, zastosowania - Rok 2015
The aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) algorithms applied to portable computers. The proposed algorithm is related to intelligent, rule-based setting of synthesis parameters according to music genre of an audio excerpt. The classification of music genres is automatically executed employing MPEG 7 parameters and the Principal Component Analysis method applied to reduce information...

Pełny tekst do pobrania w serwisie zewnętrznym
Machine learning applied to acoustic-based road traffic monitoring
Publikacja
- K. Marciniuk
- B. Kostek
- Rok 2022
The motivation behind this study lies in adapting acoustic noise monitoring systems for road traffic monitoring for driver’s safety. Such a system should recognize a vehicle type and weather-related pavement conditions based on the audio level measurement. The study presents the effectiveness of the selected machine learning algorithms in acoustic-based road traffic monitoring. Bases of the operation of the acoustic road traffic...

Pełny tekst do pobrania w portalu
Music Data Processing and Mining in Large Databases for Active Media
Publikacja
- B. Kostek
- P. Hoffmann
- Rok 2014
The aim of this paper was to investigate the problem of music data processing and mining in large databases. Tests were performed on a large data-base that included approximately 30000 audio files divided into 11 classes cor-responding to music genres with different cardinalities. Every audio file was de-scribed by a 173-element feature vector. To reduce the dimensionality of data the Principal Component Analysis (PCA) with variable...

Pełny tekst do pobrania w serwisie zewnętrznym
Zaawansowane Przetwarzanie Sygnału
Kursy Online
- A. Szewczyk
- J. Smulko
Przedmiot prezentuje wybrane metody przetwarzania sygnałów w bardzo szerokim obszarze zastosowań. Ilustruje najnowsze osiągnięcia w tym zakresie, wsparte wybranymi publikacjami. Zajęcia są podzielone na wykład (15 h) i seminarium (15 h). Podstawowe pojęcia dotyczące cyfrowego przetwarzania sygnałów, zalecana literatura Analiza widmowa gęstość widmowa mocy, widmo falkowe, polispektra i gęstość widmowa mocy skrośnej Efekty...
Further Developments of the Online Sound Restoration System for Digital Library Applications
Publikacja
- Rok 2014
New signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...

Pełny tekst do pobrania w serwisie zewnętrznym
An Approach to Bass Enhancement in Portable Computers Employing Smart Virtual Bass Synthesis Algorithms
Publikacja
- Rok 2014
The aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) algorithms applied to portable computers. The developed algorithms are related to intelligent, rule-based setting of synthesis parameters according to music genre of an audio excerpt and to the type of a portable device in use. To find optimum synthesis parameters of the VBS algorithms, subjective listening tests based on a parametric procedure...

Pełny tekst do pobrania w serwisie zewnętrznym
Sparse autoregressive modeling
Publikacja
- M. Ciołek
- Rok 2012
In the paper the comparison of the popular pitch determination (PD) algorithms for thepurpose of elimination of clicks from archive audio signals using sparse autoregressive (SAR)modeling is presented. The SAR signal representation has been widely used in code-excitedlinear prediction (CELP) systems. The appropriate construction of the SAR model is requiredto guarantee model stability. For this reason the signal representation...
Innovative method of localization airplanes in VCS (VCS-MLAT) distributed system
Publikacja
- S. Wiszniewski
- Rok 2019
The article presents the concept and the structure of the localization module. The prototype module is the part of the VCS (VCS-MLAT) localization distributed system. The device receives the audio signal transmitted in airplanes band (118 MHz – 136 MHz). Received data with the timestamps are send to the main server. The data from multiple devices estimates the localization of the airplane. The main aim of the project is the analysis...
Subjective and Objective Comparative Study of DAB+ Broadcast System
Publikacja
- P. Falkowski-Gilski
- J. Stefański
- Archives of Acoustics - Rok 2017
Broadcasting services seek to optimize their use of bandwidth in order to maximize user’s quality of experience. They aim to transmit high-quality digital speech and music signals at the lowest bitrate. They intend to offer the best quality under available conditions. Due to bandwidth limitations, audio quality is in conflict with the number of transmitted radio programs. This paper analyzes whether the quality of real-time digital...

Pełny tekst do pobrania w portalu
Cross-domain applications of multimodal human-computer interfaces
Publikacja
- A. Czyżewski
- Rok 2015
Developed multimodal interfaces for education applications and for disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and audio interface for speech stretching for hearing impaired and stuttering people and intelligent pen allowing for diagnosing and ameliorating developmental dyslexia. The eye-gaze tracking system named...
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention
Publikacja
- D. Korzekwa
- R. Barra-Chicote
- S. Zaporowski
- G. Beringer
- J. Lorenzo-trueba
- A. Serafinowicz
- J. Droppo
- T. Drugman
- B. Kostek
- Rok 2021
This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

Pełny tekst do pobrania w portalu
Methodology and technology for the polymodal allophonic speech transcription
Publikacja
- Journal of the Acoustical Society of America - Rok 2016
A method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...

Pełny tekst do pobrania w serwisie zewnętrznym
Methodology and technology for the polymodal allophonic speech transcription
Publikacja
- Journal of the Acoustical Society of America - Rok 2016
A method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...

Pełny tekst do pobrania w serwisie zewnętrznym
Sound engineering as our commitment to its creators in Poland
Publikacja
- B. Kostek
- A. Czyżewski
- Archives of Acoustics - Rok 2019
Sound engineering is an interdisciplinary and rapidly expanding domain. It covers many aspects, such as sound perception, studio and sound mastering technology, music information retrieval including content-based search systems and automatic music transcription frameworks, sound synthesis, sound restoration, electroacoustics, and other ones constituting multimedia technology. Moreover, machine learning methods applied to the topics...

Pełny tekst do pobrania w serwisie zewnętrznym
MODALITY corpus - SPEAKER 17 - SEQUENCE S1
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 17 - SEQUENCE S4
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 17 - SEQUENCE S2
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 17 - SEQUENCE S5
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 17 - SEQUENCE S3
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 17 - SEQUENCE S6
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
Measurements of OF QoS/QoE parameters for media streaming in a PMIPv6 TESTBED WITH 802.11 b/g/n WLANs
Publikacja
- Metrology and Measurement Systems - Rok 2012
A growing number of mobile devices and the increasing popularity of multimedia services result in a new challenge of providing mobility in access networks. The paper describes experimental research on media (audio and video) streaming in a mobile IEEE 802.11 b/g/n environment realizing network-based mobility. It is an approach to mobility that requires little or no modification of the mobile terminal. Assessment of relevant parameters...

Pełny tekst do pobrania w portalu
Data Analysis in Bridge of Data
Publikacja
- Rok 2022
The chapter presents the data analysis aspects of the Bridge of Data project. The software framework used, Jupyter, and its configuration are presented. The solution’s architecture, including the TRYTON supercomputer as the underlying infrastructure, is described. The use case templates provided by the Stat-reducer application are presented, including data analysis related to spatial points’ cloud-, audio- and wind-related research.

Pełny tekst do pobrania w portalu
MODALITY corpus - SPEAKER 03 - COMMANDS C6
Dane Badawcze
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 27 - SEQUENCE S1
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 42 - COMMANDS C1
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 03 - SEQUENCE S2
Dane Badawcze
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 03 - SEQUENCE S6
Dane Badawcze
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 10 - COMMANDS C1
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 41 - SEQUENCE S1
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 37 - COMMANDS C1
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 03 - SEQUENCE S3
Dane Badawcze
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 34 - SEQUENCE S1
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 03 - SEQUENCE S4
Dane Badawcze
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 03 - COMMANDS C5
Dane Badawcze
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: audio