Wyniki wyszukiwania dla: BIMODAL SPEECH RECOGNITION

Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

Publikacja

- Rok 2017

In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the...

Pełny tekst do pobrania w serwisie zewnętrznym

Performance Analysis of the OpenCL Environment on Mobile Platforms

Publikacja

- Rok 2022

Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

Pełny tekst do pobrania w serwisie zewnętrznym

Separability Assessment of Selected Types of Vehicle-Associated Noise

Publikacja

- Advances in Intelligent Systems and Computing - Rok 2016

Music Information Retrieval (MIR) area as well as development of speech and environmental information recognition techniques brought various tools in-tended for recognizing low-level features of acoustic signals based on a set of calculated parameters. In this study, the MIRtoolbox MATLAB tool, designed for music parameter extraction, is used to obtain a vector of parameters to check whether they are suitable for separation of...

Pełny tekst do pobrania w serwisie zewnętrznym

Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model

Publikacja

K. Leckey
R. Neininger
W. Szpankowski

- Rok 2013

Tries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., Lempel- Ziv'77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election algorithms to distributed hashing...

Bimodal deep learning model for subjectively enhanced emotion classification in films

Publikacja

D. Weber
B. Kostek

- INFORMATION SCIENCES - Rok 2024

This research delves into the concept of color grading in film, focusing on how color influences the emotional response of the audience. The study commenced by recalling state-of-the-art works that process audio-video signals and associated emotions by machine learning. Then, assumptions of subjective tests for refining and validating an emotion model for assigning specific emotional labels to selected film excerpts were presented....

Pełny tekst do pobrania w serwisie zewnętrznym

Computer-assisted pronunciation training—Speech synthesis is almost all you need

Publikacja

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
B. Kostek

- SPEECH COMMUNICATION - Rok 2022

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

Pełny tekst do pobrania w portalu

Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

Publikacja

G. Korvel
K. Kąkol
O. Kurasova
B. Kostek

- IEEE Access - Rok 2020

The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

Pełny tekst do pobrania w portalu

Speech Intelligibility Measurements in Auditorium

Publikacja

K. Leo

- ACTA PHYSICA POLONICA A - Rok 2010

Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

Pełny tekst do pobrania w portalu

Transient detection for speech coding applications

Publikacja

- International Journal of Computer Science and Network Security - Rok 2006

Signal quality in speech codecs may be improved by selecting transients from speech signal and encoding them using a suitable method. This paper presents an algorithm for transient detection in speech signal. This algorithm operates in several frequency bands. Transient detection functions are calculated from energy measured in short frames of the signal. The final selection of transient frames is based on results of detection...

Pełny tekst do pobrania w serwisie zewnętrznym

Improving the quality of speech in the conditions of noise and interference

Publikacja

B. Kostek
K. Kąkol

- Journal of the Acoustical Society of America - Rok 2018

The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

Pełny tekst do pobrania w serwisie zewnętrznym

Applying the Lombard Effect to Speech-in-Noise Communication

Publikacja

G. Korvel
K. Kąkol
P. Treigys
B. Kostek

- Electronics - Rok 2023

This study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...

Pełny tekst do pobrania w portalu

Constructing a Dataset of Speech Recordingswith Lombard Effect

Publikacja

D. Weber
S. Zaporowski
D. Korzekwa

- Rok 2020

Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

Improved method for real-time speech stretching

Publikacja

- Rok 2012

n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

Pełny tekst do pobrania w serwisie zewnętrznym

Real-time speech-rate modification experiments

Publikacja

- Rok 2010

An algorithm designed for real-time speech time scale modification (stretching) is proposed, providing a combination of typical synchronous overlap and add based time scale modification algorithm and signal redundancy detection algorithms that allow to remove parts of the speech signal and replace them with the stretched speech signal fragments. Effectiveness of signal processing algorithms are examined experimentally together...

Pełny tekst do pobrania w serwisie zewnętrznym

Improving Objective Speech Quality Indicators in Noise Conditions

Publikacja

K. Kąkol
G. Korvel
B. Kostek

- Rok 2020

This work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...

Pełny tekst do pobrania w serwisie zewnętrznym

Detecting Lombard Speech Using Deep Learning Approach

Publikacja

K. Kąkol
G. Korvel
G. Tamulevicius
B. Kostek

- SENSORS - Rok 2023

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

Pełny tekst do pobrania w portalu

Speech synthesis controlled by eye gazing

Publikacja

- Rok 2010

A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

Publikacja

- SENSORS - Rok 2021

The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

Pełny tekst do pobrania w portalu

Time-domain prosodic modifications for text-to-speech synthesizer

Publikacja

- Rok 2010

An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

A Method of Real-Time Non-uniform Speech Stretching

Publikacja

- Rok 2012

Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

Pełny tekst do pobrania w serwisie zewnętrznym

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

Publikacja

D. Korzekwa
R. Barra-Chicote
B. Kostek
T. Drugman
M. Łajszczak

- Rok 2019

We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

Pełny tekst do pobrania w portalu

Uncertainty in emotion recognition

Publikacja

A. Landowska

- Journal of Information, Communication and Ethics in Society - Rok 2019

Purpose–The purpose of this paper is to explore uncertainty inherent in emotion recognition technologiesand the consequences resulting from that phenomenon.Design/methodology/approach–The paper is a general overview of the concept; however, it is basedon a meta-analysis of multiple experimental and observational studies performed over the past couple of years.Findings–The mainfinding of the paper might be summarized as follows:...

Pełny tekst do pobrania w serwisie zewnętrznym

Comparison of various speech time-scale modificartion methods

Publikacja

- Archives of Acoustics - Rok 2011

The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

Speech codec enhancements utilizing time compression and perceptual coding

Publikacja

- Rok 2007

A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

Tensor Decomposition for Imagined Speech Discrimination in EEG

Publikacja

J. S. Garcia Salinas
L. Villaseñor-Pineda
C. A. Reyes-Garćia
A. A. Torres-García

- LECTURE NOTES IN COMPUTER SCIENCE - Rok 2018

Most of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...

Pełny tekst do pobrania w serwisie zewnętrznym

Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

Publikacja

- Diagnostic Pathology - Rok 2012

Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

Pełny tekst do pobrania w portalu

Recognition and sensing of anions

Publikacja

- Rok 2013

Molecular ion recognition is one of the most intensively studied areas of supramolecular technology. The reason for this is the essential role that ions play in many biological as well as industrial processes. On the other hand, however, it has been proved that ions can have a negative impact on human health and the environment. For these reasons, it is extremly important to develop rapid and simple methods allowing the determination...

Integration in Multichannel Emotion Recognition

Publikacja

- Rok 2018

The paper concerns integration of results provided by automatic emotion recognition algorithms. It presents both the challenges and the approaches to solve them. Paper shows experimental results of integration. The paper might be of interest to researchers and practitioners who deal with automatic emotion recognition and use more than one solution or multichannel observation.

Pełny tekst do pobrania w portalu

Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

Publikacja

G. Korvel
K. Kąkol
P. Treigys
B. Kostek

- Rok 2022

The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

Pełny tekst do pobrania w portalu

Ranking Speech Features for Their Usage in Singing Emotion Classification

Publikacja

- Rok 2020

This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

Pełny tekst do pobrania w portalu

Human emotion recognition with biosignals

Publikacja

W. Szwoch

- Rok 2022

This chapter presents issues in the field of affective computing. Basic preliminary information for the recognition of emotions is given and models of emotions, various ways of evoking emotions, as well as their theoretical foundations are discussed. The particular attention is given to the use of physiological signals in recognizing emotions. This subject is outlined further below by presenting selected biosignals, their relationship...

Pełny tekst do pobrania w serwisie zewnętrznym

System Supporting Speech Perception in Special Educational Needs Schoolchildren

Publikacja

- Rok 2012

The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

Pełny tekst do pobrania w serwisie zewnętrznym

High quality speech codec employing sines+noise+transients model

Publikacja

- Archives of Acoustics - Rok 2006

A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

Pełny tekst do pobrania w portalu

Silence/noise detection for speech and music signals

Publikacja

M. Papaj

- Rok 2008

This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

Virtual keyboard controlled by eye gaze employing speech synthesis

Publikacja

- Rok 2010

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Virtual Keyboard controlled by eye gaze employing speech synthesis

Publikacja

- Elektronika : konstrukcje, technologie, zastosowania - Rok 2011

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Pełny tekst do pobrania w serwisie zewnętrznym

Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

Publikacja

K. Kąkol
G. Korvel
B. Kostek

- Rok 2018

The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

Corrupted speech intelligibility improvement using adaptive filter based algorithm

Publikacja

- Rok 2010

A technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.

Distortion of speech signals in the listening area: its mechanism and measurements

Publikacja

- Rok 2014

The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

Pełny tekst do pobrania w serwisie zewnętrznym

A non-uniform real-time speech time-scale stretching method

Publikacja

- Rok 2011

An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...

Automatic sound recognition for security purposes

Publikacja

P. Żwan

- Rok 2008

In the paper an automatic sound recognition system is presented. It forms a part of a bigger security system developed in order to monitor outdoor places for non-typical audio-visual events. The analyzed audio signal is being recorded from a microphone mounted in an outdoor place thus a non stationary noise of a significant energy is present in it. In the paper an especially designed algorithm for outdoor noise reduction is presented,...

Building Knowledge for the Purpose of Lip Speech Identification

Publikacja

- Advances in Intelligent Systems and Computing - Rok 2017

Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

Pełny tekst do pobrania w serwisie zewnętrznym

Recognition of Hand Drawn Flowcharts

Publikacja

W. Szwoch
M. Mucha

- Rok 2013

In this paper the problem of hand drawn flowcharts recognition is presented. There are described two attitudes to this problem: on-line and off-line. A concept of FCE, a system for recognizing and understanding of freehand drawn on-line flow charts on desktop computer and mobile devices is presented. The first experiments with the FCE system and the planes for future are also described.

Semantic Integration of Heterogeneous Recognition Systems

Publikacja

P. Kaczmarek
P. Raszkowski

- LECTURE NOTES IN COMPUTER SCIENCE - Rok 2011

Computer perception of real-life situations is performed using a variety of recognition techniques, including video-based computer vision, biometric systems, RFID devices and others. The proliferation of recognition modules enables development of complex systems by integration of existing components, analogously to the Service Oriented Architecture technology. In the paper, we propose a method that enables integration of information...

Using Physiological Signals for Emotion Recognition

Publikacja

W. Szwoch

- Rok 2013

Recognizing user’s emotions is the promising area of research in a field of human-computer interaction. It is possible to recognize emotions using facial expression, audio signals, body poses, gestures etc. but physiological signals are very useful in this field because they are spontaneous and not controllable. In this paper a problem of using physiological signals for emotion recognition is presented. The kinds of physiological...

Pełny tekst do pobrania w serwisie zewnętrznym

Communication Platform for Evaluation of Transmitted Speech Quality

Publikacja

- Journal of Telecommunications and Information Technology - Rok 2011

A voice communication system designed and implemented is described. The purpose of the presented platform was to enable a series of experiments related to the quality assessment of algorithms used in the coding and transmitting of speech. The system is equipped with tools for recording signals at each stage of processing, making it possible to subject them to subjective assessments by listening tests or, objective evaluation employing...

Pełny tekst do pobrania w portalu

Emotion Recognition for Affect Aware Video Games

Publikacja

- Advances in Intelligent Systems and Computing - Rok 2015

In this paper the idea of affect aware video games is presented. A brief review of automatic multimodal affect recognition of facial expressions and emotions is given. The first result of emotions recognition using depth data as well as prototype affect aware video game are presented

Pełny tekst do pobrania w serwisie zewnętrznym

Emotion Recognition and Its Applications

Publikacja

- Advances in Intelligent Systems and Computing - Rok 2014

The paper proposes a set of research scenarios to be applied in four domains: software engineering, website customization, education and gaming. The goal of applying the scenarios is to assess the possibility of using emotion recognition methods in these areas. It also points out the problems of defining sets of emotions to be recognized in different applications, representing the defined emotional states, gathering the data and...

Pełny tekst do pobrania w serwisie zewnętrznym

Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

Publikacja

- Elektronika : konstrukcje, technologie, zastosowania - Rok 2008

In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

Publikacja

- Rok 2007

In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

Wyszukiwarka

Filtry

Katalog

Kategoria

Rok

Opcje

Wyniki wyszukiwania dla: BIMODAL SPEECH RECOGNITION