Filters
total: 576
filtered: 326
Search results for: audio parametrization
-
Music Data Processing and Mining in Large Databases for Active Media
PublicationThe aim of this paper was to investigate the problem of music data processing and mining in large databases. Tests were performed on a large data-base that included approximately 30000 audio files divided into 11 classes cor-responding to music genres with different cardinalities. Every audio file was de-scribed by a 173-element feature vector. To reduce the dimensionality of data the Principal Component Analysis (PCA) with variable...
-
Camera angle invariant shape recognition in surveillance systems
PublicationA method for human action recognition in surveillance systems is described. Problems within this task are discussed and a solution based on 3D object models is proposed. The idea is shown and some of its limitations are talked over. Shape description methods are introduced along with their main features. Utilized parameterization algorithm is presented. Classification problem, restricted to bi-nary cases is discussed. Support vector...
-
Further Developments of the Online Sound Restoration System for Digital Library Applications
PublicationNew signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...
-
Editor's note and 2018 reviewers
PublicationPrzedmiotem pracy jest odniesienie do prac opublikowanych w 2018 roku, jak również do serii artykułów w ramach specjalnego wydania: Special Issue on Augmented and Participatory Sound and Music Interaction Using Semantic Audio.
-
Sparse autoregressive modeling
PublicationIn the paper the comparison of the popular pitch determination (PD) algorithms for thepurpose of elimination of clicks from archive audio signals using sparse autoregressive (SAR)modeling is presented. The SAR signal representation has been widely used in code-excitedlinear prediction (CELP) systems. The appropriate construction of the SAR model is requiredto guarantee model stability. For this reason the signal representation...
-
An Approach to Bass Enhancement in Portable Computers Employing Smart Virtual Bass Synthesis Algorithms
PublicationThe aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) algorithms applied to portable computers. The developed algorithms are related to intelligent, rule-based setting of synthesis parameters according to music genre of an audio excerpt and to the type of a portable device in use. To find optimum synthesis parameters of the VBS algorithms, subjective listening tests based on a parametric procedure...
-
Innovative method of localization airplanes in VCS (VCS-MLAT) distributed system
PublicationThe article presents the concept and the structure of the localization module. The prototype module is the part of the VCS (VCS-MLAT) localization distributed system. The device receives the audio signal transmitted in airplanes band (118 MHz – 136 MHz). Received data with the timestamps are send to the main server. The data from multiple devices estimates the localization of the airplane. The main aim of the project is the analysis...
-
Cross-domain applications of multimodal human-computer interfaces
PublicationDeveloped multimodal interfaces for education applications and for disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and audio interface for speech stretching for hearing impaired and stuttering people and intelligent pen allowing for diagnosing and ameliorating developmental dyslexia. The eye-gaze tracking system named...
-
Subjective and Objective Comparative Study of DAB+ Broadcast System
PublicationBroadcasting services seek to optimize their use of bandwidth in order to maximize user’s quality of experience. They aim to transmit high-quality digital speech and music signals at the lowest bitrate. They intend to offer the best quality under available conditions. Due to bandwidth limitations, audio quality is in conflict with the number of transmitted radio programs. This paper analyzes whether the quality of real-time digital...
-
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention
PublicationThis paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...
-
Examining Feature Vector for Phoneme Recognition
PublicationThe aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...
-
Methodology and technology for the polymodal allophonic speech transcription
PublicationA method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...
-
Methodology and technology for the polymodal allophonic speech transcription
PublicationA method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...
-
Sound engineering as our commitment to its creators in Poland
PublicationSound engineering is an interdisciplinary and rapidly expanding domain. It covers many aspects, such as sound perception, studio and sound mastering technology, music information retrieval including content-based search systems and automatic music transcription frameworks, sound synthesis, sound restoration, electroacoustics, and other ones constituting multimedia technology. Moreover, machine learning methods applied to the topics...
-
Health outcomes of road-traffic pollution among exposed roadside workers in Rawalpindi City, Pakistan
Publication -
EVENTS VISUALIZATION POST IN A DISTRIBUTED TELEINFORMATION SYSTEM FOR THE BORDER GUARD
PublicationEvents Visualization Post is a part of the STRADAR project, which is dedicated to streaming real-time data in distributed dispatcher and teleinformation systems of the Border Guard. Events Visualization Post is a software designed for simultaneous visualization of data of different types. In the paper, the structure of the software is presented, the process of generation of tasks is described, and the visualization of audio, files,...
-
Measurements of OF QoS/QoE parameters for media streaming in a PMIPv6 TESTBED WITH 802.11 b/g/n WLANs
PublicationA growing number of mobile devices and the increasing popularity of multimedia services result in a new challenge of providing mobility in access networks. The paper describes experimental research on media (audio and video) streaming in a mobile IEEE 802.11 b/g/n environment realizing network-based mobility. It is an approach to mobility that requires little or no modification of the mobile terminal. Assessment of relevant parameters...
-
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders
PublicationThe purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods for automatic audio mixing first. Then, a novel deep model based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. The model is trained on a custom-prepared database. Mixes created using the...
-
Określenie parametrów modelowania geometrii krzyżownic rozjazdów zwyczajnych dla potrzeb budowy i utrzymania linii kolejowych
PublicationZdecydowana większość rozjazdów występująca na liniach kolejowych w Polsce to rozjazdy zwyczajne o typowym zestawie parametrów. Z tego powodu analiza przypadków nietypowych (takich jak rozjazdy o zmiennej krzywiźnie toru zwrotnego) może być utrudniona. Wirtualny model geometryczno-konstrukcyjny rozjazdu, generowany w oparciu o metody analityczne, stanowić może narzędzie użyteczne w sferze projektowania, konstrukcji oraz diagnostyki...
-
Data Analysis in Bridge of Data
PublicationThe chapter presents the data analysis aspects of the Bridge of Data project. The software framework used, Jupyter, and its configuration are presented. The solution’s architecture, including the TRYTON supercomputer as the underlying infrastructure, is described. The use case templates provided by the Stat-reducer application are presented, including data analysis related to spatial points’ cloud-, audio- and wind-related research.
-
Tonality Estimation and Frequency Tracking of Modulated Tonal Components
PublicationA novel method for tonality estimation and frequency tracking of tonal components modulated in frequency and amplitude is presented. The algorithm detects the local maxima of magnitude spectra corresponding to three contiguous frames of a signal and matches them into the tonal track candidates. The magnitude-based and phase-based methods are used to estimate the frequency jumps between spectrum maxima belonging to the tonal track...
-
System for automatic singing voice recognition
PublicationW artykule przedstawiono system automatycznego rozpoznawania jakości i typu głosu śpiewaczego. Przedstawiono bazę danych oraz zaimplementowane parametry. Algorytmem decyzyjnym jest algorytm sztucznych sieci neuronowych. Wytrenowany system decyzyjny osiąga skuteczność ok. 90% w obydwu kategoriach rozpoznawania. Dodatkowo wykazano przy pomocy metod statystycznych, że wyniki działania systemu automatycznej oceny jakości technicznej...
-
Expert system for automatic classification and quality assessment of singing voices
Publication.
-
DSP techniques for determining ''Wow'' distortions
PublicationArtykuł przedstawia opis algorytmów do wyznaczania charakterystyki zniekształceń kołysania dźwięku. Są to algorytmy: śledzenia przydźwięku sieciowego, śledzenia pozostałości magnetycznej prądu podkładu wielkich częstotliwości, adaptacyjnej analizy środka ciężkości widma dla wybranej części zniekształconego sygnału. Przedstawione algorytmy pozwalają na implementację programową i sprzętową.
-
New Aspects of Virtual Sound Source Localization Research—Impact of Visual Angle and 3-D Video Content on Sound Perception
PublicationThe influence of image on virtual sound source localization, called the “image proximity effect” or the “ventriloquism effect”, is a well known phenomenon. This paper focuses on other aspects related to this effect, namely the impact of the visual angle of the presented object and 3D video content on sound perception. The research conducted confirmed that the visual angle of the presented object determines the image proximity effect...
-
Measurements and Visualization of Sound Intensity Around the Human Head in Free Field Using Acoustic Vector Sensor
PublicationThis paper presents measurements and visualization of sound intensity around the human head simulator in a free field. A Cartesian robot, applied for precise positioning of the acoustic vector sensor, was used to measure sound intensity. Measurements were performed in a free field using a head and torso simulator and the setup consisting of four different loudspeaker configurations. The acoustic vector sensor was positioned around...
-
Bass Enhancement Settings in Portable Devices Based on Music Genre Recognition
PublicationThe paper presents a novel approach to the Virtual Bass Synthesis (VBS) applied to mobile devices, called Smart VBS (SVBS). The proposed algorithm uses an intelligent, rule-based setting of bass synthesis parameters adjusted to the particular music genre. Harmonic generation is based on a nonlinear device (NLD) method with the intelligent controlling system adapting to the recognized music genre. To automatically classify music...
-
Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition
Publicationconvolutional neural network (CNN) which is a class of deep, feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word...
-
Music Mood Visualization Using Self-Organizing Maps
PublicationDue to an increasing amount of music being made available in digital form in the Internet, an automatic organization of music is sought. The paper presents an approach to graphical representation of mood of songs based on Self-Organizing Maps. Parameters describing mood of music are proposed and calculated and then analyzed employing correlation with mood dimensions based on the Multidimensional Scaling. A map is created in which...
-
On the Consumption of Multimedia Content Using Mobile Devices: a Year to Year User Case Study
PublicationIn the early days, consumption of multimedia content related with audio signals was only possible in a stationary manner. The music player was located at home, with a necessary physical drive. An alternative way for an individual was to attend a live performance at a concert hall or host a private concert at home. To sum up, audio-visual effects were only reserved for a narrow group of recipients. Today, thanks to portable players,...
-
Subiektywny pomiar jakości sygnałów mowy i muzyki w lokalnych multipleksach radiofonii DAB+ w Gdańsku i Wrocławiu
PublicationRadiofonia cyfrowa DAB+ (Digital Audio Broadcasting plus) dostępna jest dla słuchaczy w Polsce od 2013 r. Standard ten oferuje szerokie możliwości konfiguracji multipleksów lokalnych nie tylko pod względem liczby, lecz także jakości nadawanych programów radiowych. Dzięki temu możliwe jest dostosowanie parametrów emitowanych sygnałów w celu sprostania oczekiwaniom odbiorców końcowych. W przeciwieństwie do radiofonii analogowej FM...
-
Developing a Low SNR Resistant, Text Independent Speaker Recognition System for Intercom Solutions - A Case Study
PublicationThis article presents a case study on the development of a biometric voice verification system for an intercom solution, utilizing the DeepSpeaker neural network architecture. Despite the variety of solutions available in the literature, there is a noted lack of evaluations for "text-independent" systems under real conditions and with varying distances between the speaker and the microphone. This article aims to bridge this gap....
-
Vocalic Segments Classification Assisted by Mouth Motion Capture
PublicationVisual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested...
-
Sample Rate Conversion with Fluctuating Resampling Ratio
PublicationIn this paper a sample rate conversion with continuouslychanging resampling ratio has been presented. The proposed implementation is based on variable fractional delay filter implemented using a Farrow structure. It have been demonstrated that using the proposed approach instantaneous resampling ratio can be freely changed. This allows for simulation of audio recored on magnetic tape with nonuniform velocity as well as removal...
-
Online Sound Restoration for Digital Library Applications
PublicationA system for sound restoration was conceived and engineered having the following features: no special sound restoration software is needed to perform audio restoration by the user, the process of restoration employs automatic reduction of noise, wow and impulse distortions performed in the online mode, no skills in digital signal processing from the user are needed. The principles of the created system and its features as well...
-
Sample Rate Conversion with Fluctuating Resampling Ratio
PublicationIn this paper a sample rate conversion with continuously changing resampling ratio has been presented. The proposed implementation is based on variable fractional delay filter implemented using a Farrow structure. It have been demonstrated that using the proposed approach instantaneous resampling ratio can be freely changed. This allows for simulation of audio recored on magnetic tape with nonuniform velocity as well as removal...
-
Visualization of events using various kinds of synchronized data for the Border Guard
PublicationSTRADAR project is dedicated to streaming real-time data in a distributed dispatcher and teleinfor-mation system of the Border Guard. The Events Visualization Post is a software designed for simultaneous visualization of data of different types in BG headquarters. The software allows the operator to visualize files, images, SMS, SDS, video, audio, and current or archival data on naval situation on digital maps. All the visualized...
-
Rozproszone przechowywanie zapasowych kopii danych
PublicationPokazano metodę wykorzystania systemu przetwarzania rozproszonego do zabezpieczenia instytucji przed skutkami ataku hakerskiego połączonego ze zniszczeniem bazy danych tej instytucji. Metoda ta polega na wplataniu pakietów danych do materiałów audio-video ściąganych przez internautów korzystających z serwisów filmowych Video-on-Demand i przechowywaniu danych w rozproszeniu na setki lub nawet tysiące komputerów.
-
Scenariusze przepływu pracy sprzężone z automatyczną akwizycją danych
PublicationPrzedstawiono tematykę inteligentnych przepływów pracy (smart workflow). Przedstawiono aplikacje oparte o inteligentne scenariusz przepływu pracy: sterowanie systemem audio, monitorowanie warunków środowiskowych pomieszczenia i dynamiczną kontekstową listę zadań. Opisano komponentową architekturę systemu. Opisano etapy poszerzające proces projektowania i implementacji. Wskazano na problemy występujące podczas wykonywania tych aplikacji...
-
Towards Cancer Patients Classification Using Liquid Biopsy
PublicationLiquid biopsy is a useful, minimally invasive diagnostic and monitoring tool for cancer disease. Yet, developing accurate methods, given the potentially large number of input features, and usually small datasets size remains very challenging. Recently, a novel feature parameterization based on the RNA-sequenced platelet data which uses the biological knowledge from the Kyoto Encyclopedia of Genes and Genomes, combined with a classifier...
-
Resistant to correlated noise and outliers discrete identification of continuous non-linear non-stationary dynamic objects
PublicationIn this article, specific methods of parameter estimation were used to identify the coefficients of continuous models represented by linear and nonlinear differential equations. The necessary discrete-time approximation of the base model is achieved by appropriately tuned FIR linear integral filters. The resulting discrete descriptions, which retain the original continuous parameterization, can then be identified using the classical...
-
Resistant to correlated noise and outliers discrete identification of continuous non-linear non-stationary dynamic objects
PublicationIn this study, dedicated methods of parameter estimation were used to identify the coefficients of continuous models represented by linear and nonlinear differential equations. The necessary discrete-time approximation of the base model is achieved by appropriately tuned FIR linear integral filters. The resulting discrete descriptions, which retain the original continuous parameterization, can then be identified using the classical...
-
Discovering Rule-Based Learning Systems for the Purpose of Music Analysis
PublicationMusic analysis and processing aims at understanding information retrieved from music (Music Information Retrieval). For the purpose of music data mining, machine learning (ML) methods or statistical approach are employed. Their primary task is recognition of musical instrument sounds, music genre or emotion contained in music, identification of audio, assessment of audio content, etc. In terms of computational approach, music databases...
-
Gesture-controlled Sound Mixing System With a Sonified Interface
PublicationIn this paper the Authors present a novel approach to sound mixing. It is materialized in a system that enables to mix sound with hand gestures recognized in a video stream. The system has been developed in such a way that mixing operations can be performed both with or without visual support. To check the hypothesis that the mixing process needs only an auditory display, the influence of audio information visualization on sound...
-
Implementation Of The Innovative Radiolocalization System VCS-MLAT (Voice Communication System Multilateration)
PublicationIn the article the concept of the radiolocalization subsystem of the VHF communication for aviation VCS-MLAT (Voice Communication System – Multilateration) is presented. The distributed localization system can estimate the position of the aircraft using the audio signals from aircraft transmitters in the VHF band (118-136 MHz). This paper shows initial verification of the possibility to use voice airband communication to estimate...
-
KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY
PublicationW referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...
-
Acceleration of decision making in sound event recognition employing supercomputing cluster
PublicationParallel processing of audio data streams is introduced to shorten the decision making time in hazardous sound event recognition. A supercomputing cluster environment with a framework dedicated to processing multimedia data streams in real time is used. The sound event recognition algorithms employed are based on detecting foreground events, calculating their features in short time frames, and classifying the events with Support...
-
BADANIE JAKOŚCI TRANSMISJI W SYSTEMACH RADIOFONII CYFROWEJ DAB I DAB+
PublicationW dobie mediów cyfrowych kluczowym elementem jest dostarczanie treści wysokiej jakości. Wśród systemów radiofonii cyfrowej do najpopularniejszych należą standardy DAB i DAB+ (Digital Audio Broadcasting). Przy konfiguracji multipleksu ważne jest właściwe zarządzanie zasobami w ramach pojedynczego kanału radiowego. W artykule przedstawiono wyniki badań subiektywnych, dotyczących jakości transmisji w systemach DAB i DAB+, przeprowadzonych...
-
Selection of Features for Multimodal Vocalic Segments Classification
PublicationEnglish speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the influence of feature vector dimension reduction for the accuracy of vocalic segments classification employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...
-
Nauka w świecie cyfrowym okiem młodego inżyniera - strumieniowanie muzyki w sieci
PublicationNa początku konsumpcja treści multimedialnych, związanych początkowo z sygnałami audio, była możliwa tylko w sposób stacjonarny. Odtwarzacz muzyczny znajdował się w domu, wraz z niezbędnym nośnikiem fizycznym. Alternatywnym sposobem dla jednostki był udział w występie na żywo w sali koncertowej lub zorganizowanie prywatnego koncertu w domu. Podsumowując, efekty audiowizualne były zarezerwowane tylko dla wąskiego grona odbiorców.