Publikacje
Filtry
wszystkich: 348
Katalog Publikacji
Rok 2023
-
Combining MUSHRA Test and Fuzzy Logic in the Evaluation of Benefits of Using Hearing Prostheses
PublikacjaAssessing the effectiveness of hearing aid fittings based on the benefits they provide is crucial but intricate. While objective metrics of hearing aids like gain, frequency response, and distortion are measurable, they do not directly indicate user benefits. Hearing aid performance assessment encompasses various aspects, such as compensating for hearing loss and user satisfaction. The authors suggest enhancing the widely used...
-
Applying the Lombard Effect to Speech-in-Noise Communication
PublikacjaThis study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...
Rok 2022
-
Musical Instrument Identification Using Deep Learning Approach
PublikacjaThe work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It focuses on tasks performed, input type, algorithms employed, and metrics used. The paper starts with the background presentation, i.e., metadata...
-
Creating a Remote Choir Performance Recording Based on an Ambisonic Approach
PublikacjaThe aim of this paper is three-fold. First, the basics of binaural and ambisonic techniques are briefly presented. Then, details related to audio-visual recordings of a remote performance of the Academic Choir of the Gdańsk University of Technology are shown. Due to the COVID-19 pandemic, artists had a choice, namely, to stay at home and not perform or stay at home and perform. In fact, staying at home brought in the possibility...
-
A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times
PublikacjaObjective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...
Rok 2021
-
Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech
PublikacjaWe propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...
-
Techniki wielokanałowe wykorzystywane w koncertach i nagraniach muzycznych na odległość
PublikacjaW czasie pandemii koronawirusa COVID-19 nowego znaczenia nabrały możliwości transmisji dźwięku z obrazem – zwłaszcza do pracy zdalnej, która w przypadku muzyków jest szczególnym wyzwaniem zarówno w kontekście wspólnych ćwiczeń i prób, jak i koncertów. Wynikła konieczność wieloźródłowego połączenia ujawniła potrzebę uprzestrzennienia dźwięku w celu łatwiejszej lokalizacji źródeł dźwięku. Tworzenie zdalnych nagrań muzycznych stało...
-
KLASYFIKACJA EMOCJI W MUZYCE FILMOWEJ Z WYKORZYSTANIEM TESTÓW SUBIEKTYWNYCH
PublikacjaCelem referatu było przedstawienie testów odsłuchowych, w których zadaniem osób ankietowanych było przypisanie danego fragmentu muzycznego do odpowiedniej klasy emocji. Kolejne kroki eksperymentu obejmowały wybór muzyki filmowej do testów (baza Epidemic Sound), przygotowanie założeń ankiety oraz modelu emocji wykorzystywanych w testach odsłuchowych, jak również konstrukcj ˛e ankiety. Ankieta została zrealizowana za pomoc ˛a formularzy...
-
Highlighting interlanguage phoneme differences based on similarity matrices and convolutional neural network
PublikacjaThe goal of this research is to find a way of highlighting the acoustic differences between consonant phonemes of the Polish and Lithuanian languages. For this purpose, similarity matrices are employed based on speech acoustic parameters combined with a convolutional neural network (CNN). In the first experiment, we compare the effectiveness of the similarity matrices applied to discerning acoustic differences between consonant...
-
Classifying Emotions in Film Music - A Deep Learning Approach
PublikacjaThe paper presents an application for automatically classifying emotions in film music. A model of emotions is proposed, which is also associated with colors. The model created has nine emotional states, to which colors are assigned according to the color theory in film. Subjective tests are carried out to check the correctness of the assumptions behind the adopted emotion model. For that purpose, a statistical analysis of the...
Rok 2020
-
Ranking Speech Features for Their Usage in Singing Emotion Classification
PublikacjaThis paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...
-
Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing
PublikacjaDeveloping signal processing methods to extract information automatically has potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile applications (e.g., tuning apps), or pre-processing for an automatic mixing system. However, the last-mentioned application needs a significant amount of research to reliably recognize real musical instruments in recordings....
-
Investigating Feature Spaces for Isolated Word Recognition
PublikacjaThe study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...
-
Improving Objective Speech Quality Indicators in Noise Conditions
PublikacjaThis work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...
-
Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement
PublikacjaThe Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...
-
Employing Subjective Tests and Deep Learning for Discovering the Relationship between Personality Types and Preferred Music Genres
PublikacjaThe purpose of this research is two-fold: (a) to explore the relationship between the listeners’ personality trait, i.e., extraverts and introverts and their preferred music genres, and (b) to predict the personality trait of potential listeners on the basis of a musical excerpt by employing several classification algorithms. We assume that this may help match songs according to the listener’s personality in social music networks....
-
Analyzing the Effectiveness of the Brain–Computer Interface for Task Discerning Based on Machine Learning
PublikacjaThe aim of the study is to compare electroencephalographic (EEG) signal feature extraction methods in the context of the effectiveness of the classification of brain activities. For classification, electroencephalographic signals were obtained using an EEG device from 17 subjects in three mental states (relaxation, excitation, and solving logical task). Blind source separation employing independent component analysis (ICA) was...
-
Analiza ruchu drogowego z wykorzystaniem analizy akustycznej
PublikacjaTematyka pracy porusza zagadnienia dotyczące pozyskiwania informacji o ruchu drogowym z wykorzystaniem monitoringu akustycznego. Przybliżono podstawowe techniki nadzoru nad ruchem drogowym. Przedstawiono założenia akustycznego detektora ruchu i zbadano jego skuteczność na trzech płaszczyznach działania – zliczania pojazdów, klasyfikacji rodzajowej i klasyfikacji warunków pogodowych panujących na nawierzchni
-
A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
PublikacjaIn this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...
Rok 2019
-
Subjective tests for gathering konwledge for applaying color grading to video clips automatically
PublikacjaThe analysis of film music concerning caused emotions may allow for a more accurate adaptation of the color of the film in the context of color grading. Therefore, this paper aims to gather knowledge on the correlation between the applied color palette to a video clip, music associated with a particular shot,and emotions evoked. For that purpose, subjective tests are prepared in which several video clips are presented with...
-
Subjective tests for gathering knowledge for applying color grading to video clips automatically
PublikacjaThe analysis of film music concerning caused emotions may allow for a more accurate adaptation of the color of the film in the context of color grading. Therefore, this paper aims to gather knowledge on the correlation between the applied color palette to a video clip, music associated with a particular shot, and emotions evoked. For that purpose, subjective tests are prepared in which several video clips are presented with or...
-
Speech Analytics Based on Machine Learning
PublikacjaIn this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...
-
Sound engineering as our commitment to its creators in Poland
PublikacjaSound engineering is an interdisciplinary and rapidly expanding domain. It covers many aspects, such as sound perception, studio and sound mastering technology, music information retrieval including content-based search systems and automatic music transcription frameworks, sound synthesis, sound restoration, electroacoustics, and other ones constituting multimedia technology. Moreover, machine learning methods applied to the topics...
-
Relationship between album cover design and music genres.
PublikacjaThe aim of the study is to find out whether there exists a relationship between typographic, compositional and coloristic elements of the music album cover design and music contained in the album. The research study involves basic statistical analysis of the manually extracted data coming from the worldwide album covers. The samples represent 34 different music genres, coming from nine countries from around the world. There are...
-
Recovering Sound Produced by Wind Turbine Structures Employing Video Motion Magnification
PublikacjaThe recordings were made with a fast video camera and with a microphone. Using fast cameras allowed for observation of the micro vibrations of the object structure. Motion-magnified video recordings of wind turbines on a wind farm were made for the purpose of building a damage prediction system. An idea was to use video to recover sound & vibrations in order to obtain a contactless diagnostic method for wind turbines. The recovered signals...
-
Music information retrieval—The impact of technology, crowdsourcing, big data, and the cloud in art.
PublikacjaThe exponential growth of computer processing power, cloud data storage, and crowdsourcing model of gathering data bring new possibilities to music information retrieval (mir) field. Mir is no longer music content retrieval only; the area also comprises the discovery of expressing feelings and emotions contained in music, incorporating other than hearing modalities for helping this issue, users’ profiling, merging music with social...
-
Method for Clustering of Brain Activity Data Derived from EEG Signals
PublikacjaA method for assessing separability of EEG signals associated with three classes of brain activity is proposed. The EEG signals are acquired from 23 subjects, gathered from a headset consisting of 14 electrodes. Data are processed by applying Discrete Wavelet Transform (DWT) for the signal analysis and an autoencoder neural network for the brain activity separation. Processing involves 74 wavelets from 3 DWT families: Coiflets,...
-
MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES
PublikacjaAutomatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’...
-
Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech
PublikacjaWe present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...
-
Discovering Rule-Based Learning Systems for the Purpose of Music Analysis
PublikacjaMusic analysis and processing aims at understanding information retrieved from music (Music Information Retrieval). For the purpose of music data mining, machine learning (ML) methods or statistical approach are employed. Their primary task is recognition of musical instrument sounds, music genre or emotion contained in music, identification of audio, assessment of audio content, etc. In terms of computational approach, music databases...
-
Comparison of the effectiveness of automatic EEG signal class separation algorithms
PublikacjaIn this paper, an algorithm for automatic brain activity class identification of EEG (electroencephalographic) signals is presented. EEG signals are gathered from seventeen subjects performing one of the three tasks: resting, watching a music video and playing a simple logic game. The methodology applied consists of several steps, namely: signal acquisition, signal processing utilizing z-score normalization, parametrization and...
-
Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results
PublikacjaThe goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are...
-
Assessment of the Effectiveness of a Short-term Hearing Aid Use in Patients with Different Degrees of Hearing Loss
PublikacjaThe study presents evaluating the effectiveness of the hearing aid fitting process in the short-term use (7 days). The evaluation method consists of a survey based on the APHAB (Abbreviated Profile of Hearing Aid Benefit) questionnaire. Additional criteria such as a degree of hearing loss, number of hours and days of hearing aid use as well as the user’s experience were also taken into consideration. The outcomes of the benefit...
-
ANALIZA PARAMETRÓW SYGNAŁU MOWY W KONTEKŚCIE ICH PRZYDATNOŚCI W AUTOMATYCZNEJ OCENIE JAKOŚCI EKSPRESJI ŚPIEWU
PublikacjaPraca dotyczy podejścia do parametryzacji w przypadku klasyfikacji emocji w śpiewie oraz porównania z klasyfikacją emocji w mowie. Do tego celu wykorzystano bazę mowy i śpiewu nacechowanego emocjonalnie RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song), zawierającą nagrania profesjonalnych aktorów prezentujących sześć różnych emocji. Następnie obliczono współczynniki mel-cepstralne (MFCC) oraz wybrane deskryptory...
-
ANALIZA KOLORÓW SCEN FILMOWYCH W KONTEKŚCIE COLOR GRADINGU
PublikacjaW artykule przedstawiono zagadnienia związane z kolorowaniem sceny filmowej. W pracy przedyskutowano główne aspekty obróbki koloru obrazu filmowego oraz omówiono definicje pojęć związanych z kolorowaniem sceny, tj.: color correction oraz color gradingu. Opisano teorie psychologii koloru oraz ich praktyczne wykorzystanie w filmie i odniesiono je do podstawowych gatunków filmowych i modeli emocji. Następnie przedyskutowano założenia...
-
An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics
PublikacjaThe speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...
-
A Concept of Automatic Film Color Grading Based on Music Recognition and Evoked Emotions
PublikacjaThe article presents the aspects of the final selection of the color of shots in film production based on the psychology of color. First of all, the elements of color processing, contrast, saturation or white balance in the film shots were presented and the definition of color grading was given. In the second part of the article the analysis of film music was conducted in the context of stimulating appropriate emotions while watching...
Rok 2018
-
ZASTOSOWANIE APLIKACJI INTERNETOWEJ W OCENIE JAKOŚCI DOPASOWANIA APARATÓW SŁUCHOWYCH
PublikacjaW pracy opisano zastosowanie aplikacji internetowej do oceny jakości dopasowania aparatów słuchowych. Metoda oceny polega na badaniu ankietowym, uzupełnionym testem rozumienia słów jednosylabowych w polu swobodnym. Opisywana aplikacja internetowa pozwala na przeprowadzenie badania z dowolnego komputera z dostępem do sieci. Dzięki implementacji metody w postaci aplikacji internetowej, można w systematyczny i uporządkowany sposób...
-
WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE
PublikacjaW niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...
-
Towards Audio Signal Equalization Based on Spectral Characteristics of a Listening Room and Music Content Reproduced
PublikacjaThis study presents investigations of the influence of the room acoustics on the frequency characteristic of the audio signal playback. First, the concept of a novel spectral equalization method of the room acoustic conditions is introduced. On the basis of the room spectral response, a system for room acoustics compensation based on an equalizer designed is proposed. The system settings depend on music genre recognized automatically....
-
The influence of time of hearing aid use on auditory perception in various acoustic situations
PublikacjaThe assessment of sound perception in hearing aids, especially in the context of benefits that a prosthesis can bring, is a complex issue. The objective parameters of the hearing aids can easily be determined. These parameters, however, do not always have a direct and decisive influence on the subjective assessment of quality of the patient’s hearing while using a hearing aid. The paper presents the development of a method for...
-
The influence of sound track on the viewer’s emotions and correction of the color in the film
PublikacjaThe article presents the aspects of the final selection of colors in film production based on the emotions caused by the soundtrack of the film. First, the processing of colors, contrast, saturation and white balance of shots in the film was presented. The definition of color grading is also described, i.e. the color changes in the film's views. In the second part of the article, the soundtracks of the film were analyzed, in particular...
-
Sound quality metrics applied to road noise evaluation
PublikacjaRoad noise monitoring systems typically measure sound levels in specific time periods. The more insightful approach suggests to measure also the nature of noise. Sound quality of sounds such as car noise can be objectively evaluated by several parameters. One of them is psychoacoustic annoyance, described by loudness, tone color, and the temporal structure of sound. In this paper the assessment of several sound quality parameters, such...
-
POPRAWA OBIEKTYWNYCH WSKAŹNIKÓW JAKOŚCI MOWY W WARUNKACH HAŁASU
PublikacjaCelem pracy jest modyfikacja sygnału mowy, aby uzyskać zwiększenie poprawy obiektywnych wskaźników jakości mowy po zmiksowaniu sygnału użytecznego z szumem bądź z sygnałem zakłócającym. Wykonane modyfikacje sygnału bazują na cechach mowy lombardzkiej, a w szczególności na efekcie podniesienia częstotliwości podstawowej F0. Sesja nagraniowa obejmowała zestawy słów i zdań w języku polskim, nagrane w warunkach ciszy, jak również w...
-
Objectivization of phonological evaluation of speech elements by means of audio parametrization
PublikacjaThis study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...
-
Machine Learning Applied to Aspirated and Non-Aspirated Allophone Classification—An Approach Based on Audio "Fingerprinting"
PublikacjaThe purpose of this study is to involve both Convolutional Neural Networks and a typical learning algorithm in the allophone classification process. A list of words including aspirated and non-aspirated allophones pronounced by native and non-native English speakers is recorded and then edited and analyzed. Allophones extracted from English speakers’ recordings are presented in the form of two-dimensional spectrogram images and...
-
Listening to Live Music: Life beyond Music Recommendation Systems
PublikacjaThis paper presents first a short review on music recommendation systems based on social collaborative filtering. A dictionary of terms related to music recommendation systems, such as music information retrieval (MIR), Query-by-Example (QBE), Query-by-Category (QBC), music content, music annotating, music tagging, bridging the semantic gap in music domain, etc. is introduced. Bases of music recommender systems are shortly presented,...
-
Investigating Feature Spaces for Isolated Word Recognition
PublikacjaMuch attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...
-
INFLUENCE OF DATA NORMALIZATION ON THE EFFECTIVENESS OF NEURAL NETWORKS APPLIED TO CLASSIFICATION OF PAVEMENT CONDITIONS – CASE STUDY
PublikacjaIn recent years automatic classification employing machine learning seems to be in high demand for tele-informatic-based solutions. An example of such solutions are intelligent transportation systems (ITS), in which various factors are taken into account. The subject of the study presented is the impact of data pre-processing and normalization on the accuracy and training effectiveness of artificial neural networks in the case...
-
In Memoriam Professors Marianna Sankiewicz-Budzyński and Gustaw K.E. Budzyński - Founders of the Polish Audio Engineering
PublikacjaBiography and scientific achievements of Professors Marianna Sankiewicz-Budzyński and Gustaw K.E. Budzyński - Founders of the Polish Audio Engineering.