Wyniki wyszukiwania dla: AUDIO PROCESSING
-
New Applications of Multimodal Human-Computer Interfaces
PublikacjaMultimodal computer interfaces and examples of their applications to education software and for the disabled people are presented. The proposed interfaces include the interactive electronic whiteboard based on video image analysis, application for controlling computers with gestures and the audio interface for speech stretching for hearing impaired and stuttering people. Application of the eye-gaze tracking system to awareness...
-
Bimodal Emotion Recognition Based on Vocal and Facial Features
PublikacjaEmotion recognition is a crucial aspect of human communication, with applications in fields such as psychology, education, and healthcare. Identifying emotions accurately is challenging, as people use a variety of signals to express and perceive emotions. In this study, we address the problem of multimodal emotion recognition using both audio and video signals, to develop a robust and reliable system that can recognize emotions...
-
Further developments of parameterization methods of audio stream analysis for secuirty purposes
PublikacjaThe paper presents an automatic sound recognition algorithm intended for application in an audiovisual security monitoring system. A distributed character of security systems does not allow for simultaneous observation of multiple multimedia streams, thus an automatic recognition algorithm must be introduced. In the paper, a module for the parameterization and automatic detection of audio events is described. The spectral analyses...
-
Bass Enhancement Settings in Portable Devices Based on Music Genre Recognition
PublikacjaThe paper presents a novel approach to the Virtual Bass Synthesis (VBS) applied to mobile devices, called Smart VBS (SVBS). The proposed algorithm uses an intelligent, rule-based setting of bass synthesis parameters adjusted to the particular music genre. Harmonic generation is based on a nonlinear device (NLD) method with the intelligent controlling system adapting to the recognized music genre. To automatically classify music...
-
Subjective and Objective Quality Evaluation Study of BPL -PLC Wired Medium
PublikacjaThis paper presents results of research on the effectiveness of bi-directional voice transmission in a 6 kV mine cable network using BPL-PLC (Broadband over Power Line - Power Line Communication) technology. It concerns both emergency cable state (supply outage with cable shorted at both ends) and loaded with distorted current waveforms. The narrowband (0.5 MHz–15 MHz) and broadband (two different modes, frequency range of 3 MHz–7.5...
-
Study on CPU and RAM Resource Consumption of Mobile Devices using Streaming Services
PublikacjaStreaming multimedia services have become very popular in recent years, due to the development of wireless networks. With the growing number of mobile devices worldwide, service providers offer dedicated applications that allow to deliver on-demand audio and video content anytime and everywhere. The aim of this study was to compare different streaming services and investigate their impact on the CPU and RAM resources, with respect...
-
Musical Instrument Identification Using Deep Learning Approach
PublikacjaThe work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It focuses on tasks performed, input type, algorithms employed, and metrics used. The paper starts with the background presentation, i.e., metadata...
-
Architecture Design of a Networked Music Performance Platform for a Chamber Choir
PublikacjaThis paper describes an architecture design process for Networked Music Performance (NMP) platform for medium-sized conducted music ensembles, based on remote rehearsals of Academic Choir of Gdańsk University of Technology. The issues of real-time remote communication, in-person music performance, and NMP are described. Three iterative steps defining and extending the architecture of the NMP platform with additional features to...
-
Speech Analytics Based on Machine Learning
PublikacjaIn this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...
-
Broadening the scope of measurement and analysis of vibrations of an organ pipe employing intensity probe, simulations, and highspeed camera
PublikacjaThis paper shows an integrated approach to measure, analyze, and model phenomena occurring in an organ pipe driven by pressurized air. The aim of this paper is two-fold, i.e., to measure the pressure signal and the intensity field around the mouth by means of an intensity probe and to visualize and observe the motion of the air jet, which represents the excitation mechanism of the system. This is realized through two techniques,...
-
Automatic Breath Analysis System Using Convolutional Neural Networks
PublikacjaDiseases related to the human respiratory system have always been a burden for the entire society. The situation has become particularly difficult now after the outbreak of the COVID-19 pandemic. Even now, however, it is not uncommon for people to consult their doctor too late, after the disease has developed. To protect patients from severe disease, it is recommended that any symptoms disturbing the respiratory system be detected...
-
Automatic Breath Analysis System Using Convolutional Neural Networks
PublikacjaDiseases related to the human respiratory system have always been a burden for the entire society. The situation has become particularly difficult now after the outbreak of the COVID-19 pandemic. Even now, however, it is common for people to consult their doctor too late, after the disease has developed. To protect patients from severe disease, it is recommended that any symptoms disturbing the respiratory system be detected as...
-
TRANSPORT POSSIBILITY FOR MPEG-4/AVC- AND MPEG-2-ENCODED VIDEO DATA IN IPTV: A COMPARISON STUDY
PublikacjaIPTV (Television over IP) is a modern service with a great potential to expand. It uses the IP transport platform, that is already in worldwide operation. At the time of writing, two techniques are used to transport the video and audio data of IPTV: MPEG-2 TS and Native RTP. The two techniques quite definitely have an influence on both quality of service (QoS) and quality of experience (QoE). This paper sets out to demonstrate...
-
Smart Virtual Bass Synthesis Algorithm Based on Music Genre Classification
PublikacjaThe aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) algorithms applied to portable computers. The proposed algorithm employed automatic music genre recognition to determine the optimum parameters for the synthesis of additional frequencies. The synthesis was carried out using the non-linear device (NLD) and phase vocoder (PV) methods depending on the music excerpt genre. Classification of musical...
-
A Review of Emotion Recognition Methods Based on Data Acquired via Smartphone Sensors
PublikacjaIn recent years, emotion recognition algorithms have achieved high efficiency, allowing the development of various affective and affect-aware applications. This advancement has taken place mainly in the environment of personal computers offering the appropriate hardware and sufficient power to process complex data from video, audio, and other channels. However, the increase in computing and communication capabilities of smartphones,...
-
Enhancing voice biometric security: Evaluating neural network and human capabilities in detecting cloned voices
PublikacjaThis study assesses speaker verification efficacy in detecting cloned voices, particularly in safety-critical applications such as healthcare documentation and banking biometrics. It compares deeply trained neural networks like the DeepSpeaker with human listeners in recognizing these cloned voices, underlining the severe implications of voice cloning in these sectors. Cloned voices in healthcare could endanger patient safety by...
-
Creating a Remote Choir Performance Recording Based on an Ambisonic Approach
PublikacjaThe aim of this paper is three-fold. First, the basics of binaural and ambisonic techniques are briefly presented. Then, details related to audio-visual recordings of a remote performance of the Academic Choir of the Gdańsk University of Technology are shown. Due to the COVID-19 pandemic, artists had a choice, namely, to stay at home and not perform or stay at home and perform. In fact, staying at home brought in the possibility...
-
Comparing traffic intensity estimates employing passive acoustic radar and microwave Doppler radar sensor
PublikacjaThe purpose of our applied research project is to develop an autonomous road sign with built-in radar devices of our design. In this paper, we show that it is possible to calibrate the acoustic vector sensor so that it can be used to measure traffic volume and count the vehicles involved in the traffic through the analysis of the noise emitted by them. Signals obtained from a Doppler radar are used as a reference source. Although...
-
Objectivization of Audio-Visual Correlation analysis
PublikacjaSimultaneous perception of audio and visual stimuli often causes the concealment or misrepresentation of information actually contained in these stimuli. Such effects are called the ''image proximity effect'' or the ''ventriloquism effect'' in literature. Until recently, most research carried out to understand their nature was based on subjective assessments. The Authors of this paper propose a methodology based on both subjective...
-
Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej
PublikacjaThe bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...
-
Multimodal human-computer interfaces based on advanced video and audio analysis
PublikacjaMultimodal interfaces development history is reviewed briefly in the introduction. Examples of applications of multimodal interfaces to education software and for the disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and the audio interface for speech stretching for hearing impaired and stuttering people. The Smart...
-
Buzz-based honeybee colony fingerprint
PublikacjaNon-intrusive remote monitoring has its applications in a variety of areas. For industrial surveillance case, devices are capable of detecting anomalies that may threaten machine operation. Similarly, agricultural monitoring devices are used to supervise livestock or provide higher yields. Modern IoT devices are often coupled with Machine Learning models, which provide valuable insights into device operation. However, the data...
-
Evaluation of aspiration problems in L2 English pronunciation employing machine learning
PublikacjaThe approach proposed in this study includes methods specifically dedicated to the detection of allophonic variation in English. This study aims to find an efficient method for automatic evaluation of aspiration in the case of Polish second-language (L2) English speakers’ pronunciation when whole words are analyzed instead of particular allophones extracted from words. Sample words including aspirated and unaspirated allophones...
-
Audio Feature Analysis for Precise Vocalic Segments Classification in English
PublikacjaAn approach to identifying the most meaningful Mel-Frequency Cepstral Coefficients representing selected allophones and vocalic segments for their classification is presented in the paper. For this purpose, experiments were carried out using algorithms such as Principal Component Analysis, Feature Importance, and Recursive Parameter Elimination. The data used were recordings made within the ALOFON corpus containing audio signal...
-
Fully Automated AI-powered Contactless Cough Detection based on Pixel Value Dynamics Occurring within Facial Regions
PublikacjaIncreased interest in non-contact evaluation of the health state has led to higher expectations for delivering automated and reliable solutions that can be conveniently used during daily activities. Although some solutions for cough detection exist, they suffer from a series of limitations. Some of them rely on gesture or body pose recognition, which might not be possible in cases of occlusions, closer camera distances or impediments...
-
MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES
PublikacjaAutomatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’...
-
Subjective quality evaluation of 8- and 10-bit MP4-coded video sequences from Netflix
PublikacjaRecently, many researchers have been intensively conducting quality of service (QoS), quality of experience (QoE), and user experience (UX) studies in the field of video analysis. This paper is intended to make a new, complementary contribution to this field. Currently, streaming platforms are key products in relation to delivering video content online. Most often, they include the MP4 video format, which is most widely utilized...
-
Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions
PublikacjaThe aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...
-
ZINTEGROWANY SYSTEM DOMOWEGO MONITORINGU PARAMETRÓW MEDYCZNYCH OSÓB STARSZYCH I CHORYCH
PublikacjaProponowane rozwiązania mają na celu wspomaganie osób starszych i chorych, tak by mogły jak najdłużej mieszkać i żyć samodzielnie ze zwiększonym poczuciem bezpieczeństwa, iż są nadzorowane i w razie nagłego zagrożenia życia nie pozostaną bez pomocy. System jednocześnie nie narusza poczucia zachowania prywatności i intymności, gdyż nie są używane do monitoringu kamery wizyjne czy też stały nasłuch audio. Dodatkowo gromadzone informacje...