prof. dr hab. inż. Bożena Kostek
Employment
- Head of Research Laboratory at Laboratorium Akustyki Fonicznej
- Professor at Laboratorium Akustyki Fonicznej
Publications
Filters
total: 384
Catalog Publications
Year 2024
-
Sounding Mechanism of a Flue Organ Pipe—A Multi-Sensor Measurement Approach
PublicationThis work presents an approach that integrates the results of measuring, analyzing, and modeling air flow phenomena driven by pressurized air in a flue organ pipe. The investigation concerns a Bourdon organ pipe. Measurements are performed in an anechoic chamber using the Cartesian robot equipped with a 3D acoustic vector sensor (AVS) that acquires both acoustic pressure and air particle velocity. Also, a high-speed camera is employed...
Year 2023
-
WYKORZYSTANIE TESTU MUSHRA W BADANIU KORZYŚCI UŻYTKOWANIA PROTEZ SŁUCHOWYCH
PublicationOcena jakości dopasowania aparatów słuchowych w kontekście korzyści, jakie może przy-nieść proteza jest złożonym zagadnieniem. Obiektywne parametry aparatów, które można wy-znaczyć (np. wzmocnienie czy pasmo przenoszenia) nie zawsze mają bezpośredni i decydujący wpływ w subiektywnej ocenie jakości dopasowania protezy słuchowej przez pacjenta. Pomiary efektywności aparatu słuchowego mogą dotyczyć wielu aspektów, między innymi kompensacji...
-
SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM
PublicationThe main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...
-
Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set
PublicationThis work deals with automatically detecting stuttering and its subclasses. An effective classification of stuttering along with its subclasses could find wide application in determining the severity of stuttering by speech therapists, preliminary patient diagnosis, and enabling communication with the previously mentioned voice assistants. The first part of this work provides an overview of examples of classical and deep learning...
-
Predicting emotion from color present in images and video excerpts by machine learning
PublicationThis work aims at predicting emotion based on the colors present in images and video excerpts using a machine-learning approach. The purpose of this paper is threefold: (a) to develop a machine-learning algorithm that classifies emotions based on the color present in an image, (b) to select the best-performing algorithm from the first phase and apply it to film excerpt emotion analysis based on colors, (c) to design an online survey...
-
INVESTIGATION OF THE LOMBARD EFFECT BASED ON A MACHINE LEARNING APPROACH
PublicationThe Lombard effect is an involuntary increase in the speaker’s pitch, intensity, and duration in the presence of noise. It makes it possible to communicate in noisy environments more effectively. This study aims to investigate an efficient method for detecting the Lombard effect in uttered speech. The influence of interfering noise, room type, and the gender of the person on the detection process is examined. First, acoustic parameters...
-
Detecting Lombard Speech Using Deep Learning Approach
PublicationRobust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...
-
Data, Information, Knowledge, Wisdom Pyramid Concept Revisited in the Context of Deep Learning
PublicationIn this paper, the data, information, knowledge, and wisdom (DIKW) pyramid is revisited in the context of deep learning applied to machine learningbased audio signal processing. A discussion on the DIKW schema is carried out, resulting in a proposal that may supplement the original concept. Parallels between DIWK pertaining to audio processing are presented based on examples of the case studies performed by the author and her collaborators....
-
Combining MUSHRA Test and Fuzzy Logic in the Evaluation of Benefits of Using Hearing Prostheses
PublicationAssessing the effectiveness of hearing aid fittings based on the benefits they provide is crucial but intricate. While objective metrics of hearing aids like gain, frequency response, and distortion are measurable, they do not directly indicate user benefits. Hearing aid performance assessment encompasses various aspects, such as compensating for hearing loss and user satisfaction. The authors suggest enhancing the widely used...
-
AUTOMATYCZNA KLASYFIKACJA MOWY PATOLOGICZNEJ
PublicationAplikacja przedstawiona w niniejszym rozdziale służy do automatycznego wykrywania mowy patologicznej na podstawie bazy nagrań. W pierwszej kolejności przedstawiono założenia leżące u podstaw przeprowadzonych badan wraz z wyborem bazy mowy patologicznej. Zaprezentowano również zastosowane algorytmy oraz cechy sygnału mowy, które pozwalają odróżnić mowę niezaburzoną od mowy patologicznej. Wytrenowane sieci neuronowe zostały następnie...
-
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders
PublicationThe purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods for automatic audio mixing first. Then, a novel deep model based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. The model is trained on a custom-prepared database. Mixes created using the...
-
Assessing the attractiveness of human face based on machine learning
PublicationThe attractiveness of the face plays an important role in everyday life, especially in the modern world where social media and the Internet surround us. In this study, an attempt to assess the attractiveness of a face by machine learning is shown. Attractiveness is determined by three deep models whose sum of predictions is the final score. Two annotated datasets available in the literature are employed for training and testing...
-
Applying the Lombard Effect to Speech-in-Noise Communication
PublicationThis study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...
Year 2022
-
Z PERSPEKTYWY NIECO PONAD 15 LAT DZIAŁALNOŚCI ODDZIAŁU IEEE GDAŃSK COMPUTER SOCIETY (CHAPTER C16) NA WYDZIALE ELEKTRONIKI, TELEKOMUNIKACJI I INFORMATYKI, POLITECHNIKI GDAŃSKIEJ
PublicationW pracy przywołano pokrótce najważniejsze działania, które towarzyszyły powstaniu i funkcjonowaniu Oddziału IEEE Gdańsk Computer Society (Chapter C16). Zaprezentowano skład Zarządu Oddziału w kolejnych kadencjach. Zwrócono uwagę między innymi na rolę Oddziału w promowaniu osiągnięć wybitnych naukowców, prezentujących swoje prace w ramach wykładów, odbywających się pod auspicjami Oddziału, jak też na współudział Oddziału w organizacji...
-
Technologia CyberOko do diagnozy, rehabilitacji i komunikowania się z pacjentami niewykazującymi oznak przytomności
PublicationCyberOko jest rozwiązaniem opracowanym w Politechnice Gdańskiej, które umożliwia nawiązanie kontaktu i pracę z osobami głęboko upośledzonymi komunikacyjnie. W sposób inteligentny śledzi ruch gałek ocznych, dzięki czemu umożliwia rehabilitację i ocenę stanu świadomości pacjenta nawet w stanie całkowitego porażenia. Rozwiązanie obejmuje także analizę fal EEG, obiektywne badanie słuchu i badanie sygnałów z macierzy elektrod wszczepianych...
-
Pursuing Listeners’ Perceptual Response in Audio-Visual Interactions - Headphones vs Loudspeakers: A Case Study
PublicationThis study investigates listeners’ perceptual responses in audio-visual interactions concerning binaural spatial audio. Audio stimuli are coupled with or without visual cues to the listeners. The subjective test participants are tasked to indicate the direction of the incoming sound while listening to the audio stimulus via loudspeakers or headphones with the head-related transfer function (HRTF) plugin. First, the methodology...
-
Pursuing Analytically the Influence of Hearing Aid Use on Auditory Perception in Various Acoustic Situations
PublicationThe paper presents the development of a method for assessing auditory perception and the effectiveness of applying hearing aids for hard-of-hearing people during short-term (up to 7 days) and longer-term (up to 3 months) use. The method consists of a survey based on the APHAB questionnaire. Additional criteria such as the degree of hearing loss, technological level of hearing aids used, as well as the user experience are taken...
-
Noise profiling for speech enhancement employing machine learning models
PublicationThis paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...
-
Musical Instrument Identification Using Deep Learning Approach
PublicationThe work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It focuses on tasks performed, input type, algorithms employed, and metrics used. The paper starts with the background presentation, i.e., metadata...
-
Mining Knowledge of Respiratory Rate Quantification and Abnormal Pattern Prediction
PublicationThe described application of granular computing is motivated because cardiovascular disease (CVD) remains a major killer globally. There is increasing evidence that abnormal respiratory patterns might contribute to the development and progression of CVD. Consequently, a method that would support a physician in respiratory pattern evaluation should be developed. Group decision-making, tri-way reasoning, and rough set–based analysis...
-
Machine learning applied to acoustic-based road traffic monitoring
PublicationThe motivation behind this study lies in adapting acoustic noise monitoring systems for road traffic monitoring for driver’s safety. Such a system should recognize a vehicle type and weather-related pavement conditions based on the audio level measurement. The study presents the effectiveness of the selected machine learning algorithms in acoustic-based road traffic monitoring. Bases of the operation of the acoustic road traffic...
-
Machine learning applied to acoustic-based road traffic monitoring
PublicationThe motivation behind this study lies in adapting acoustic noise monitoring systems for road traffic monitoring for driver’s safety. Such a system should recognize a vehicle type and weather-related pavement conditions based on the audio level measurement. The study presents the effectiveness of the selected machine learning algorithms in acoustic-based road traffic monitoring. Bases of the operation of the acoustic road traffic...
-
Klasyfikacja emocji w muzyce filmowej z wykorzystaniem uczenia głębokiego
PublicationPraca przedstawia zagadnienia związane z klasyfikacją emocji w muzyce filmowej. W artykule zaproponowano model emocji zawierający dziewięć stanów emocjonalnych, do których przypisany jest kolor zgodnie z teorią koloru w filmie. Kolejne kroki eksperymentu obejmowały wybór muzyki filmowej do testów (baza Epidemic Sound), przygotowanie założeń ankiety oraz modelu emocji wykorzystywanych w testach odsłuchowych, a także konstrukcję...
-
Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically
PublicationThe aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...
-
Intelligent Audio Signal Processing − Do We Still Need Annotated Datasets?
PublicationIn this paper, intelligent audio signal processing examples are shortly described. The focus is, however, on the machine learning approach and datasets needed, especially for deep learning models. Years of intense research produced many important results in this area; however, the goal of fully intelligent signal processing, characterized by its autonomous acting, is not yet achieved. Therefore, a review of state-of-the-art concerning...
-
Creating a Remote Choir Performance Recording Based on an Ambisonic Approach
PublicationThe aim of this paper is three-fold. First, the basics of binaural and ambisonic techniques are briefly presented. Then, details related to audio-visual recordings of a remote performance of the Academic Choir of the Gdańsk University of Technology are shown. Due to the COVID-19 pandemic, artists had a choice, namely, to stay at home and not perform or stay at home and perform. In fact, staying at home brought in the possibility...
-
Computer-assisted pronunciation training—Speech synthesis is almost all you need
PublicationThe research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...
-
Broadening the scope of measurement and analysis of vibrations of an organ pipe employing intensity probe, simulations, and highspeed camera
PublicationThis paper shows an integrated approach to measure, analyze, and model phenomena occurring in an organ pipe driven by pressurized air. The aim of this paper is two-fold, i.e., to measure the pressure signal and the intensity field around the mouth by means of an intensity probe and to visualize and observe the motion of the air jet, which represents the excitation mechanism of the system. This is realized through two techniques,...
-
Analysis-by-synthesis paradigm evolved into a new concept
PublicationThis work aims at showing how the well-known analysis-by-synthesis paradigm has recently been evolved into a new concept. However, in contrast to the original idea stating that the created sound should not fail to pass the foolproof synthesis test, the recent development is a consequence of the need to create new data. Deep learning models are greedy algorithms requiring a vast amount of data that, in addition, should be correctly...
-
Algoritmically improved microwave radar monitors breathing more acurrate than sensorized belt
PublicationThis paper describes a novel way to measure, process, analyze, and compare respiratory signals acquired by two types of devices: a wearable sensorized belt and a microwave radar-based sensor. Both devices provide breathing rate readouts. First, the background research is presented. Then, the underlying principles and working parameters of the microwave radar-based sensor, a contactless device for monitoring breathing, are described....
-
A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times
PublicationObjective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...
Year 2021
-
Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech
PublicationWe propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...
-
Techniki wielokanałowe wykorzystywane w koncertach i nagraniach muzycznych na odległość
PublicationW czasie pandemii koronawirusa COVID-19 nowego znaczenia nabrały możliwości transmisji dźwięku z obrazem – zwłaszcza do pracy zdalnej, która w przypadku muzyków jest szczególnym wyzwaniem zarówno w kontekście wspólnych ćwiczeń i prób, jak i koncertów. Wynikła konieczność wieloźródłowego połączenia ujawniła potrzebę uprzestrzennienia dźwięku w celu łatwiejszej lokalizacji źródeł dźwięku. Tworzenie zdalnych nagrań muzycznych stało...
-
Skuteczność klasyfikacji gatunków muzycznych za pomocą sieci neuronowej w zależności od typu danych wejściowych
PublicationRozpoznawanie gatunku muzycznego jest jednym z podstawowych elementów inteligentnych systemów tworzenia automatycznych list muzyki. Platformy strumieniowe oferujące taką usługę wymagają rozwiązań, które umożliwią jak najdokładniej określić przynależność utworu do gatunku muzycznego. Zgodnie z aktualnym stanem wiedzy – najskuteczniejszym klasyfikatorem są sztuczne sieci neuronowe (w tym w wersji uczenia głębokiego), dla których...
-
Reinforcement Learning Algorithm and FDTD-based Simulation Applied to Schroeder Diffuser Design Optimization
PublicationThe aim of this paper is to propose a novel approach to the algorithmic design of Schroeder acoustic diffusers employing a deep learning optimization algorithm and a fitness function based on a computer simulation of the propagation of acoustic waves. The deep learning method employed for the research is a deep policy gradient algorithm. It is used as a tool for carrying out a sequential optimization process the goal of which is...
-
Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling
PublicationA common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result...
-
KLASYFIKACJA EMOCJI W MUZYCE FILMOWEJ Z WYKORZYSTANIEM TESTÓW SUBIEKTYWNYCH
PublicationCelem referatu było przedstawienie testów odsłuchowych, w których zadaniem osób ankietowanych było przypisanie danego fragmentu muzycznego do odpowiedniej klasy emocji. Kolejne kroki eksperymentu obejmowały wybór muzyki filmowej do testów (baza Epidemic Sound), przygotowanie założeń ankiety oraz modelu emocji wykorzystywanych w testach odsłuchowych, jak również konstrukcj ˛e ankiety. Ankieta została zrealizowana za pomoc ˛a formularzy...
-
Introduction to the special issue on machine learning in acoustics
PublicationWhen we started our Call for Papers for a Special Issue on “Machine Learning in Acoustics” in the Journal of the Acoustical Society of America, our ambition was to invite papers in which machine learning was applied to all acoustics areas. They were listed, but not limited to, as follows: • Music and synthesis analysis • Music sentiment analysis • Music perception • Intelligent music recognition • Musical source separation • Singing...
-
How Machine Learning Contributes to Solve Acoustical Problems
PublicationMachine learning is the process of learning functional relationships between measured signals (called percepts in the artificial intelligence literature) and some output of interest. In some cases, we wish to learn very specific relationships from signals such as identifying the language of a speaker (e.g. Zissman, 1996) which has direct applications such as in call center routing or performing a music information retrieval task...
-
Highlighting interlanguage phoneme differences based on similarity matrices and convolutional neural network
PublicationThe goal of this research is to find a way of highlighting the acoustic differences between consonant phonemes of the Polish and Lithuanian languages. For this purpose, similarity matrices are employed based on speech acoustic parameters combined with a convolutional neural network (CNN). In the first experiment, we compare the effectiveness of the similarity matrices applied to discerning acoustic differences between consonant...
-
Evaluation of aspiration problems in L2 English pronunciation employing machine learning
PublicationThe approach proposed in this study includes methods specifically dedicated to the detection of allophonic variation in English. This study aims to find an efficient method for automatic evaluation of aspiration in the case of Polish second-language (L2) English speakers’ pronunciation when whole words are analyzed instead of particular allophones extracted from words. Sample words including aspirated and unaspirated allophones...
-
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention
PublicationThis paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...
-
Classifying Emotions in Film Music - A Deep Learning Approach
PublicationThe paper presents an application for automatically classifying emotions in film music. A model of emotions is proposed, which is also associated with colors. The model created has nine emotional states, to which colors are assigned according to the color theory in film. Subjective tests are carried out to check the correctness of the assumptions behind the adopted emotion model. For that purpose, a statistical analysis of the...
-
AUTOMATYCZNE GENEROWANIE KOLEJNOŚCI LIST UTWORÓW MUZYCZNYCH
PublicationW niniejszym rozdziale przedstawiono przygotowanie algorytmu do automa-tycznego układania kolejności utworów muzycznych i zgrywającego je do postaci jednego, długiego miksu. Dzięki algorytmowi dobierane są utwory na podstawie analizy podobieństwa fragmentów końcowych i początkowych utworów. Podo-bieństwo to jest obliczane za pomocą odległości euklidesowej między wektorami parametrów wyznaczonymi przez autoenkoder oraz na podstawie...
-
Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions
PublicationThe paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...
Year 2020
-
Ranking Speech Features for Their Usage in Singing Emotion Classification
PublicationThis paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...
-
Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing
PublicationDeveloping signal processing methods to extract information automatically has potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile applications (e.g., tuning apps), or pre-processing for an automatic mixing system. However, the last-mentioned application needs a significant amount of research to reliably recognize real musical instruments in recordings....
-
Investigating Feature Spaces for Isolated Word Recognition
PublicationThe study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...
-
Improving Objective Speech Quality Indicators in Noise Conditions
PublicationThis work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...
-
Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement
PublicationThe Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...
-
Employing Subjective Tests and Deep Learning for Discovering the Relationship between Personality Types and Preferred Music Genres
PublicationThe purpose of this research is two-fold: (a) to explore the relationship between the listeners’ personality trait, i.e., extraverts and introverts and their preferred music genres, and (b) to predict the personality trait of potential listeners on the basis of a musical excerpt by employing several classification algorithms. We assume that this may help match songs according to the listener’s personality in social music networks....
-
Analyzing the Effectiveness of the Brain–Computer Interface for Task Discerning Based on Machine Learning
PublicationThe aim of the study is to compare electroencephalographic (EEG) signal feature extraction methods in the context of the effectiveness of the classification of brain activities. For classification, electroencephalographic signals were obtained using an EEG device from 17 subjects in three mental states (relaxation, excitation, and solving logical task). Blind source separation employing independent component analysis (ICA) was...
-
Analiza ruchu drogowego z wykorzystaniem analizy akustycznej
PublicationTematyka pracy porusza zagadnienia dotyczące pozyskiwania informacji o ruchu drogowym z wykorzystaniem monitoringu akustycznego. Przybliżono podstawowe techniki nadzoru nad ruchem drogowym. Przedstawiono założenia akustycznego detektora ruchu i zbadano jego skuteczność na trzech płaszczyznach działania – zliczania pojazdów, klasyfikacji rodzajowej i klasyfikacji warunków pogodowych panujących na nawierzchni
-
A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
PublicationIn this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...
Year 2019
-
Subjective tests for gathering konwledge for applaying color grading to video clips automatically
PublicationThe analysis of film music concerning caused emotions may allow for a more accurate adaptation of the color of the film in the context of color grading. Therefore, this paper aims to gather knowledge on the correlation between the applied color palette to a video clip, music associated with a particular shot,and emotions evoked. For that purpose, subjective tests are prepared in which several video clips are presented with...
-
Subjective tests for gathering knowledge for applying color grading to video clips automatically
PublicationThe analysis of film music concerning caused emotions may allow for a more accurate adaptation of the color of the film in the context of color grading. Therefore, this paper aims to gather knowledge on the correlation between the applied color palette to a video clip, music associated with a particular shot, and emotions evoked. For that purpose, subjective tests are prepared in which several video clips are presented with or...
-
Speech Analytics Based on Machine Learning
PublicationIn this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...
-
Sound engineering as our commitment to its creators in Poland
PublicationSound engineering is an interdisciplinary and rapidly expanding domain. It covers many aspects, such as sound perception, studio and sound mastering technology, music information retrieval including content-based search systems and automatic music transcription frameworks, sound synthesis, sound restoration, electroacoustics, and other ones constituting multimedia technology. Moreover, machine learning methods applied to the topics...
-
Relationship between album cover design and music genres.
PublicationThe aim of the study is to find out whether there exists a relationship between typographic, compositional and coloristic elements of the music album cover design and music contained in the album. The research study involves basic statistical analysis of the manually extracted data coming from the worldwide album covers. The samples represent 34 different music genres, coming from nine countries from around the world. There are...
-
Recovering Sound Produced by Wind Turbine Structures Employing Video Motion Magnification
PublicationThe recordings were made with a fast video camera and with a microphone. Using fast cameras allowed for observation of the micro vibrations of the object structure. Motion-magnified video recordings of wind turbines on a wind farm were made for the purpose of building a damage prediction system. An idea was to use video to recover sound & vibrations in order to obtain a contactless diagnostic method for wind turbines. The recovered signals...
-
Music information retrieval—The impact of technology, crowdsourcing, big data, and the cloud in art.
PublicationThe exponential growth of computer processing power, cloud data storage, and crowdsourcing model of gathering data bring new possibilities to music information retrieval (mir) field. Mir is no longer music content retrieval only; the area also comprises the discovery of expressing feelings and emotions contained in music, incorporating other than hearing modalities for helping this issue, users’ profiling, merging music with social...
-
Method for Clustering of Brain Activity Data Derived from EEG Signals
PublicationA method for assessing separability of EEG signals associated with three classes of brain activity is proposed. The EEG signals are acquired from 23 subjects, gathered from a headset consisting of 14 electrodes. Data are processed by applying Discrete Wavelet Transform (DWT) for the signal analysis and an autoencoder neural network for the brain activity separation. Processing involves 74 wavelets from 3 DWT families: Coiflets,...
-
MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES
PublicationAutomatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’...
-
Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech
PublicationWe present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...
-
Discovering Rule-Based Learning Systems for the Purpose of Music Analysis
PublicationMusic analysis and processing aims at understanding information retrieved from music (Music Information Retrieval). For the purpose of music data mining, machine learning (ML) methods or statistical approach are employed. Their primary task is recognition of musical instrument sounds, music genre or emotion contained in music, identification of audio, assessment of audio content, etc. In terms of computational approach, music databases...
-
Comparison of the effectiveness of automatic EEG signal class separation algorithms
PublicationIn this paper, an algorithm for automatic brain activity class identification of EEG (electroencephalographic) signals is presented. EEG signals are gathered from seventeen subjects performing one of the three tasks: resting, watching a music video and playing a simple logic game. The methodology applied consists of several steps, namely: signal acquisition, signal processing utilizing z-score normalization, parametrization and...
-
Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results
PublicationThe goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are...
-
Assessment of the Effectiveness of a Short-term Hearing Aid Use in Patients with Different Degrees of Hearing Loss
PublicationThe study presents evaluating the effectiveness of the hearing aid fitting process in the short-term use (7 days). The evaluation method consists of a survey based on the APHAB (Abbreviated Profile of Hearing Aid Benefit) questionnaire. Additional criteria such as a degree of hearing loss, number of hours and days of hearing aid use as well as the user’s experience were also taken into consideration. The outcomes of the benefit...
-
ANALIZA PARAMETRÓW SYGNAŁU MOWY W KONTEKŚCIE ICH PRZYDATNOŚCI W AUTOMATYCZNEJ OCENIE JAKOŚCI EKSPRESJI ŚPIEWU
PublicationPraca dotyczy podejścia do parametryzacji w przypadku klasyfikacji emocji w śpiewie oraz porównania z klasyfikacją emocji w mowie. Do tego celu wykorzystano bazę mowy i śpiewu nacechowanego emocjonalnie RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song), zawierającą nagrania profesjonalnych aktorów prezentujących sześć różnych emocji. Następnie obliczono współczynniki mel-cepstralne (MFCC) oraz wybrane deskryptory...
-
ANALIZA KOLORÓW SCEN FILMOWYCH W KONTEKŚCIE COLOR GRADINGU
PublicationW artykule przedstawiono zagadnienia związane z kolorowaniem sceny filmowej. W pracy przedyskutowano główne aspekty obróbki koloru obrazu filmowego oraz omówiono definicje pojęć związanych z kolorowaniem sceny, tj.: color correction oraz color gradingu. Opisano teorie psychologii koloru oraz ich praktyczne wykorzystanie w filmie i odniesiono je do podstawowych gatunków filmowych i modeli emocji. Następnie przedyskutowano założenia...
-
An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics
PublicationThe speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...
-
A Concept of Automatic Film Color Grading Based on Music Recognition and Evoked Emotions
PublicationThe article presents the aspects of the final selection of the color of shots in film production based on the psychology of color. First of all, the elements of color processing, contrast, saturation or white balance in the film shots were presented and the definition of color grading was given. In the second part of the article the analysis of film music was conducted in the context of stimulating appropriate emotions while watching...
Year 2018
-
ZASTOSOWANIE APLIKACJI INTERNETOWEJ W OCENIE JAKOŚCI DOPASOWANIA APARATÓW SŁUCHOWYCH
PublicationW pracy opisano zastosowanie aplikacji internetowej do oceny jakości dopasowania aparatów słuchowych. Metoda oceny polega na badaniu ankietowym, uzupełnionym testem rozumienia słów jednosylabowych w polu swobodnym. Opisywana aplikacja internetowa pozwala na przeprowadzenie badania z dowolnego komputera z dostępem do sieci. Dzięki implementacji metody w postaci aplikacji internetowej, można w systematyczny i uporządkowany sposób...
-
WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE
PublicationW niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...
-
Towards Audio Signal Equalization Based on Spectral Characteristics of a Listening Room and Music Content Reproduced
PublicationThis study presents investigations of the influence of the room acoustics on the frequency characteristic of the audio signal playback. First, the concept of a novel spectral equalization method of the room acoustic conditions is introduced. On the basis of the room spectral response, a system for room acoustics compensation based on an equalizer designed is proposed. The system settings depend on music genre recognized automatically....
-
The influence of time of hearing aid use on auditory perception in various acoustic situations
PublicationThe assessment of sound perception in hearing aids, especially in the context of benefits that a prosthesis can bring, is a complex issue. The objective parameters of the hearing aids can easily be determined. These parameters, however, do not always have a direct and decisive influence on the subjective assessment of quality of the patient’s hearing while using a hearing aid. The paper presents the development of a method for...
-
The influence of sound track on the viewer’s emotions and correction of the color in the film
PublicationThe article presents the aspects of the final selection of colors in film production based on the emotions caused by the soundtrack of the film. First, the processing of colors, contrast, saturation and white balance of shots in the film was presented. The definition of color grading is also described, i.e. the color changes in the film's views. In the second part of the article, the soundtracks of the film were analyzed, in particular...
-
Sound quality metrics applied to road noise evaluation
PublicationRoad noise monitoring systems typically measure sound levels in specific time periods. The more insightful approach suggests to measure also the nature of noise. Sound quality of sounds such as car noise can be objectively evaluated by several parameters. One of them is psychoacoustic annoyance, described by loudness, tone color, and the temporal structure of sound. In this paper the assessment of several sound quality parameters, such...
-
POPRAWA OBIEKTYWNYCH WSKAŹNIKÓW JAKOŚCI MOWY W WARUNKACH HAŁASU
PublicationCelem pracy jest modyfikacja sygnału mowy, aby uzyskać zwiększenie poprawy obiektywnych wskaźników jakości mowy po zmiksowaniu sygnału użytecznego z szumem bądź z sygnałem zakłócającym. Wykonane modyfikacje sygnału bazują na cechach mowy lombardzkiej, a w szczególności na efekcie podniesienia częstotliwości podstawowej F0. Sesja nagraniowa obejmowała zestawy słów i zdań w języku polskim, nagrane w warunkach ciszy, jak również w...
-
Objectivization of phonological evaluation of speech elements by means of audio parametrization
PublicationThis study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...
-
Machine Learning Applied to Aspirated and Non-Aspirated Allophone Classification—An Approach Based on Audio "Fingerprinting"
PublicationThe purpose of this study is to involve both Convolutional Neural Networks and a typical learning algorithm in the allophone classification process. A list of words including aspirated and non-aspirated allophones pronounced by native and non-native English speakers is recorded and then edited and analyzed. Allophones extracted from English speakers’ recordings are presented in the form of two-dimensional spectrogram images and...
-
Listening to Live Music: Life beyond Music Recommendation Systems
PublicationThis paper presents first a short review on music recommendation systems based on social collaborative filtering. A dictionary of terms related to music recommendation systems, such as music information retrieval (MIR), Query-by-Example (QBE), Query-by-Category (QBC), music content, music annotating, music tagging, bridging the semantic gap in music domain, etc. is introduced. Bases of music recommender systems are shortly presented,...
-
Investigating Feature Spaces for Isolated Word Recognition
PublicationMuch attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...
-
INFLUENCE OF DATA NORMALIZATION ON THE EFFECTIVENESS OF NEURAL NETWORKS APPLIED TO CLASSIFICATION OF PAVEMENT CONDITIONS – CASE STUDY
PublicationIn recent years automatic classification employing machine learning seems to be in high demand for tele-informatic-based solutions. An example of such solutions are intelligent transportation systems (ITS), in which various factors are taken into account. The subject of the study presented is the impact of data pre-processing and normalization on the accuracy and training effectiveness of artificial neural networks in the case...
-
In Memoriam Professors Marianna Sankiewicz-Budzyński and Gustaw K.E. Budzyński - Founders of the Polish Audio Engineering
PublicationBiography and scientific achievements of Professors Marianna Sankiewicz-Budzyński and Gustaw K.E. Budzyński - Founders of the Polish Audio Engineering.
-
Improving the quality of speech in the conditions of noise and interference
PublicationThe aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...
-
Examining Feature Vector for Phoneme Recognition
PublicationThe aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...
-
EVALUATION OF SOUND QUALITY FEATURES ON ENVIRONMENTAL NOISE EFFECTS – A CASE STUDY APPLIED TO ROAD TRAFFIC NOISE
PublicationThe paper shows a study on the relationship between noise measures and sound quality (SQ) features that are related to annoyance caused by the traffic noise. First, a methodology to perform analyses related to the traffic noise annoyance is described including references to parameters of the assessment of road noise sources. Next, the measurement setup, location and results are presented along with the derived sound quality features....
-
Eksternalizacja w binauralnej ambisonicznej auralizacji źródeł kierunkowych
PublicationW artykule przedstawiono najważniejsze składniki procesu skutecznego renderowania trójwymiarowego obrazu dźwiękowego za pomocą słuchawek. W tym celu badany jest stopień oddziaływania poszczególnych czynników wpływających na eksternalizację dźwięku: śledzenie położenia głowy (ang. head tracking), indywidualne funkcje przenoszenia głowy (HRTF – Head Related Transfer Function, odnoszące się do matematycznej funkcji propagacji dźwięku...
-
Editor's note and 2018 reviewers
PublicationPrzedmiotem pracy jest odniesienie do prac opublikowanych w 2018 roku, jak również do serii artykułów w ramach specjalnego wydania: Special Issue on Augmented and Participatory Sound and Music Interaction Using Semantic Audio.
-
Comparative analysis of various transformation techniques for voiceless consonants modeling
PublicationIn this paper, a comparison of various transformation techniques, namely Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Discrete Walsh Hadamard Transform (DWHT) are performed in the context of their application to voiceless consonant modeling. Speech features based on these transformation techniques are extracted. These features are mean and derivative values of cepstrum coefficients, derived from each transformation....
-
Comparative analysis of spectral and cepstral feature extraction techniques for phoneme modelling
PublicationPhoneme parameter extraction framework based on spectral and cepstral parameters is proposed. Using this framework, the phoneme signal is divided into frames and Hamming window is used. The performances are evaluated for recognition of Lithuanian vowel and semivowel phonemes. Different feature sets without noise as well as at different level of noise are considered. Two classical machine learning methods (Naive Bayes and Support...
-
Classification of Music Genres by Means of Listening Tests and Decision Algorithms
PublicationThe paper compares the results of audio excerpt assignment to a music genre obtained in listening tests and classification by means of decision algorithms. A short review on music description employing music styles and genres is given. Then, assumptions of listening tests to be carried out along with an online survey for assigning audio samples to selected music genres are presented. A framework for music parametrization is created...
-
Bimodal classification of English allophones employing acoustic speech signal and facial motion capture
PublicationA method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...
-
Automatic music genre classification based on musical instrument track separation / Automatyczna klasyfikacja gatunku muzycznego wykorzystująca algorytm separacji dźwięku instrumentó muzycznych
PublicationThe aim of this article is to investigate whether separating music tracks at the pre-processing phase and extending feature vector by parameters related to the specific musical instruments that are characteristic for the given musical genre allow for efficient automatic musical genre classification in case of database containing thousands of music excerpts and a dozen of genres. Results of extensive experiments show that the approach...
-
Automatic Clustering of EEG-Based Data Associated with Brain Activity
PublicationThe aim of this paper is to present a system for automatic assigning electroencephalographic (EEG) signals to appropriate classes associated with brain activity. The EEG signals are acquired from a headset consisting of 14 electrodes placed on skull. Data gathered are first processed by the Independent Component Analysis algorithm to obtain estimates of signals generated by primary sources reflecting the activity of the brain....
-
Audio-visual aspect of the Lombard effect and comparison with recordings depicting emotional states.
PublicationIn this paper an analysis of audio-visual recordings of the Lombard effect is shown. First, audio signal is analyzed indicating the presence of this phenomenon in the recorded sessions. The principal aim, however, was to discuss problems related to extracting differences caused by the Lombard effect, present in the video , i.e. visible as tension and work of facial muscles aligned to an increase in the intensity of the articulated...
-
AUDIO SIGNAL EQUALIZATION BASED ON IMPULSE RESPONSE OF A LISTENING ROOM AND MUSIC CONTENT REPRODUCED
PublicationA research study presents investigations of the influence of the room acoustics on the frequency characteristic of the audio signal playback. First, a concept of a novel spectral equalization method of the room acoustic conditions is introduced. On the basis of the room spectral response, a system for room acoustics compensation based on an equalizer designed is proposed. The system settings depend on music genre recognized automatically....
-
Aparat słuchowy a alternatywne urządzenia poprawiające słyszenie
PublicationW opracowaniu dokonano przeglądu dostępnych prac dotyczących różnych rodzajów urządzeń poprawiających słyszenie, które w szczególnych przypadkach mogą być traktowane jako rozwiązania alternatywne w stosunku do klasycznych aparatów słuchowych. Praca zawiera dyskusję na temat nowego rodzaju aparatu słuchowego wstępnie zaprogramowanego, który może być dystrybuowany korespondencyjnie lub bezpośrednio potencjalnym użytkownikom. Ponadto...
-
Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions
PublicationThe aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...
seen 5089 times