Search results for: SPEECH INTELLIGIBILITY IMPROVEMENT

Corrupted speech intelligibility improvement using adaptive filter based algorithm

Publication

- Year 2010

A technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.

Speech Intelligibility Measurements in Auditorium

Publication

K. Leo

- ACTA PHYSICA POLONICA A - Year 2010

Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

Full text available to download

Influence of modulation detection threshold on speech intelligibility

Publication

K. Leo

- ACTA PHYSICA POLONICA A - Year 2011

Full text available to download

Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

Publication

K. Kąkol

- Year 2023

The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

Full text available to download

Detection of dialogue in movie soundtrack for speech intelligibility enhancement

Publication

K. Łopatka

- Year 2014

A method for detecting dialogue in 5.1 movie soundtrack based on interchannel spectral disparity is presented. The front channel signals (left, right, center) are analyzed in the frequency domain. The selected partials in the center channel signal, which yield high disparity with left and right channels, are detected as dialogue. Subsequently, the dialogue frequency components are boosted to achieve increased dialogue intelligibility....

Full text to download in external service

Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

Publication

- Diagnostic Pathology - Year 2012

Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

Full text available to download

A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times

Publication

- SENSORS - Year 2022

Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...

Full text available to download

Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

Publication

K. Kąkol
G. Korvel
B. Kostek

- Year 2018

The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

Publication

- SENSORS - Year 2021

The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

Full text available to download

Distortion of speech signals in the listening area: its mechanism and measurements

Publication

- Year 2014

The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

Full text to download in external service

Modeling and Designing Acoustical Conditions of the Interior – Case Study

Publication

- Archives of Acoustics - Year 2016

The primary aim of this research study was to model acoustic conditions of the Courtyard of the Gdańsk University of Technology Main Building, and then to design a sound reinforcement system for this interior. First, results of measurements of the parameters of the acoustic field are presented. Then, the comparison between measured and predicted values using the ODEON program is shown. Collected data indicate a long reverberation...

Full text available to download

Speech Analytics Based on Machine Learning

Publication

- Year 2019

In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

Full text to download in external service

PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS

Publication

- Year 2015

The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...

Noise profiling for speech enhancement employing machine learning models

Publication

K. Kąkol
G. Korvel
B. Kostek

- Journal of the Acoustical Society of America - Year 2022

This paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...

Full text available to download

Creating new voices using normalizing flows

Publication

P. Biliński
T. Merritt
A. Ezzerg
K. Pokora
S. Cygert
K. Yanagisawa
R. Barra-Chicote
D. Korzekwa

- Year 2022

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

Full text available to download

Metoda i algorytmy modyfikacji sygnału do celu wspomagania rozumienia mowy przez osoby z pogorszoną rozdzielczością czasową słuchu

Publication

A. Kupryjanow

- Year 2013

Przedmiotem badań przeprowadzonych w ramach rozprawy są metody modyfikacji czasu trwania sygnału (ang. Time Scale Modification –TSM) mowy operujące w czasie rzeczywistym oraz ocena ich wpływu na rozumienie wypowiedzi przez osoby z pogorszoną rozdzielczością czasową słuchu. Pogorszona rozdzielczość słuchu jest jednym z symptomów związanych z ośrodkowymi zaburzeniami słuchu (ang. Cetnral Auditory Processing Disorder – CAPD). W odróżnieniu...

Impact of the glazed roof on acoustics of historic interiors

Publication

A. Kulowski

- Year 2018

The paper discusses the adverse acoustic phenomena occurring in the semi-open interiors (courtyards, yards) covered with a glass roof. Particularly negative is the rever-beration noise, which leads to the degradation of the utility functions of the resulting spaces. It involves the drastically reducing the intelligibility of speech, loss of natural sounding of music, problems with the sound system, as well as disturbances in the...

Waveguide model of the hearing aid earmold system

Publication

- Year 2006

Background The earmold system of the Behind-The-Ear hearing aid is an acoustic system that modifies the spectrum of the propagated sound waves. Improper selection of the earmold system may result in deterioration of sound quality and speech intelligibility. Computer modeling methods may be useful in the process of hearing aid fitting, allowing physician to examine various earmold system configurations and choose the optimum one...

Full text to download in external service

Waveguide model of the hearing aid earmold system

Publication

- Diagnostic Pathology - Year 2006

Background The earmold system of the Behind-The-Ear hearing aid is an acoustic system that modifies the spectrum of the propagated sound waves. Improper selection of the earmold system may result in deterioration of sound quality and speech intelligibility. Computer modeling methods may be useful in the process of hearing aid fitting, allowing physician to examine various earmold system configurations and choose the optimum one...

Full text available to download

Novel 5.1 Downmix Algorithm with Improved Dialogue Intelligibility

Publication

- Year 2013

A new algorithm for 5.1 to stereo downmix is introduced, which addresses the problem of dialogue intelligibility. The algorithm utilizes proposed signal processing algorithms to enhance the intelligibility of movie dialogues, especially in difficult listening conditions or in compromised speaker setup. To account for the latter, a playback configuration utilizing a portable device, i.e. an ultrabook, is examined. The experiments...

Full text to download in external service

Computer-assisted pronunciation training—Speech synthesis is almost all you need

Publication

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
B. Kostek

- SPEECH COMMUNICATION - Year 2022

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

Full text available to download

Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

Publication

G. Korvel
K. Kąkol
O. Kurasova
B. Kostek

- IEEE Access - Year 2020

The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

Full text available to download

Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning

Publication

A. Czyżewski

- Journal of the Acoustical Society of America - Year 2023

Text-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...

Full text available to download

Language Models in Speech Recognition

Publication

J. Daciuk

- Year 2022

This chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.

Full text to download in external service

Transient detection for speech coding applications

Publication

- International Journal of Computer Science and Network Security - Year 2006

Signal quality in speech codecs may be improved by selecting transients from speech signal and encoding them using a suitable method. This paper presents an algorithm for transient detection in speech signal. This algorithm operates in several frequency bands. Transient detection functions are calculated from energy measured in short frames of the signal. The final selection of transient frames is based on results of detection...

Full text to download in external service

Improving the quality of speech in the conditions of noise and interference

Publication

B. Kostek
K. Kąkol

- Journal of the Acoustical Society of America - Year 2018

The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

Full text to download in external service

Applying the Lombard Effect to Speech-in-Noise Communication

Publication

G. Korvel
K. Kąkol
P. Treigys
B. Kostek

- Electronics - Year 2023

This study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...

Full text available to download

Constructing a Dataset of Speech Recordingswith Lombard Effect

Publication

D. Weber
S. Zaporowski
D. Korzekwa

- Year 2020

Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

Improved method for real-time speech stretching

Publication

- Year 2012

n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

Full text to download in external service

Methodology and technology for the polymodal allophonic speech transcription

Publication

- Journal of the Acoustical Society of America - Year 2016

A method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...

Full text to download in external service

Methodology and technology for the polymodal allophonic speech transcription

Publication

- Journal of the Acoustical Society of America - Year 2016

A method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...

Full text to download in external service

Real-time speech-rate modification experiments

Publication

- Year 2010

An algorithm designed for real-time speech time scale modification (stretching) is proposed, providing a combination of typical synchronous overlap and add based time scale modification algorithm and signal redundancy detection algorithms that allow to remove parts of the speech signal and replace them with the stretched speech signal fragments. Effectiveness of signal processing algorithms are examined experimentally together...

Full text to download in external service

Improving Objective Speech Quality Indicators in Noise Conditions

Publication

K. Kąkol
G. Korvel
B. Kostek

- Year 2020

This work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...

Full text to download in external service

Detecting Lombard Speech Using Deep Learning Approach

Publication

K. Kąkol
G. Korvel
G. Tamulevicius
B. Kostek

- SENSORS - Year 2023

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

Full text available to download

Speech synthesis controlled by eye gazing

Publication

- Year 2010

A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

Time-domain prosodic modifications for text-to-speech synthesizer

Publication

- Year 2010

An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

A Method of Real-Time Non-uniform Speech Stretching

Publication

- Year 2012

Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

Full text to download in external service

Voice command recognition using hybrid genetic algorithm

Publication

- TASK Quarterly - Year 2010

Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...

Full text available to download

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

Publication

D. Korzekwa
R. Barra-Chicote
B. Kostek
T. Drugman
M. Łajszczak

- Year 2019

We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

Full text available to download

Examining Influence of Distance to Microphone on Accuracy of Speech Recognition

Publication

- Year 2015

The problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...

Full text to download in external service

Comparison of various speech time-scale modificartion methods

Publication

- Archives of Acoustics - Year 2011

The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

Speech codec enhancements utilizing time compression and perceptual coding

Publication

- Year 2007

A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

Tensor Decomposition for Imagined Speech Discrimination in EEG

Publication

J. S. Garcia Salinas
L. Villaseñor-Pineda
C. A. Reyes-Garćia
A. A. Torres-García

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2018

Most of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...

Full text to download in external service

Multimodal English corpus for automatic speech recognition

Publication

- Year 2013

A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

Publication

- Year 2016

Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

Full text to download in external service

An audio-visual corpus for multimodal automatic speech recognition

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017

review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

Full text available to download

Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

Publication

G. Korvel
K. Kąkol
P. Treigys
B. Kostek

- Year 2022

The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

Full text available to download

Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

Publication

- Year 2016

The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...

Ranking Speech Features for Their Usage in Singing Emotion Classification

Publication

- Year 2020

This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

Full text available to download

Search

Filters

Catalog

Search results for: SPEECH INTELLIGIBILITY IMPROVEMENT