Wyniki wyszukiwania dla: speech recordings

Constructing a Dataset of Speech Recordingswith Lombard Effect

Publikacja

D. Weber
S. Zaporowski
D. Korzekwa

- Rok 2020

Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

Publikacja

- Rok 2016

Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

Pełny tekst do pobrania w serwisie zewnętrznym

An audio-visual corpus for multimodal automatic speech recognition

Publikacja

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Rok 2017

review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

Pełny tekst do pobrania w portalu

Detecting Lombard Speech Using Deep Learning Approach

Publikacja

K. Kąkol
G. Korvel
G. Tamulevicius
B. Kostek

- SENSORS - Rok 2023

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

Pełny tekst do pobrania w portalu

Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

Publikacja

K. Kąkol
G. Korvel
B. Kostek

- Rok 2018

The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

Publikacja

G. Korvel
K. Kąkol
O. Kurasova
B. Kostek

- IEEE Access - Rok 2020

The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

Pełny tekst do pobrania w portalu

Audio-visual aspect of the Lombard effect and comparison with recordings depicting emotional states.

Publikacja

- Rok 2018

In this paper an analysis of audio-visual recordings of the Lombard effect is shown. First, audio signal is analyzed indicating the presence of this phenomenon in the recorded sessions. The principal aim, however, was to discuss problems related to extracting differences caused by the Lombard effect, present in the video , i.e. visible as tension and work of facial muscles aligned to an increase in the intensity of the articulated...

Pełny tekst do pobrania w serwisie zewnętrznym

Improving the quality of speech in the conditions of noise and interference

Publikacja

B. Kostek
K. Kąkol

- Journal of the Acoustical Society of America - Rok 2018

The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

Pełny tekst do pobrania w serwisie zewnętrznym

Multimodal English corpus for automatic speech recognition

Publikacja

- Rok 2013

A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...

Visual Lip Contour Detection for the Purpose of Speech Recognition

Publikacja

- Rok 2014

A method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...

Analysis of allophones based on audio signal recordings and parameterization

Publikacja

- Journal of the Acoustical Society of America - Rok 2017

The aim of this study is to develop an allophonic description of English plosive consonants based on recordings of 600 specially selected words. Allophonic variations addressed in the study may have two sources: positional and contextual. The former one depends on the syllabic or prosodic position in which a particular phoneme occurs. Contextual allophony is conditioned by the local phonetic environment. Co-articulation overlapping...

Pełny tekst do pobrania w serwisie zewnętrznym

Ranking Speech Features for Their Usage in Singing Emotion Classification

Publikacja

- Rok 2020

This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

Pełny tekst do pobrania w portalu

KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY

Publikacja

- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Rok 2016

W referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...

A survey of automatic speech recognition deep models performance for Polish medical terms

Publikacja

- Rok 2023

Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....

Pełny tekst do pobrania w serwisie zewnętrznym

Database of speech and facial expressions recorded with optimized face motion capture settings

Publikacja

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Rok 2019

The broad objective of the present research is the analysis of spoken English employing a multiplicity of modalities. An important stage of this process, discussed in the paper, is creating a database of speech accompanied with facial expressions. Recordings of speakers were made using an advanced system for capturing facial muscle motion. A brief historical outline, current applications, limitations and the ways of capturing face...

Pełny tekst do pobrania w portalu

Objectivization of phonological evaluation of speech elements by means of audio parametrization

Publikacja

- Rok 2018

This study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...

Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech

Publikacja

D. Piotrowski
R. Korzeniowski
A. Falai
S. Cygert
K. Pokora
G. Tinchev
Z. Zhang
K. Yanagisawa

- Rok 2023

In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

Pełny tekst do pobrania w serwisie zewnętrznym

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

Publikacja

A. Czyżewski
B. Kostek
T. Ciszewski
D. Majewicz

- Rok 2013

The bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...

Noise profiling for speech enhancement employing machine learning models

Publikacja

K. Kąkol
G. Korvel
B. Kostek

- Journal of the Acoustical Society of America - Rok 2022

This paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...

Pełny tekst do pobrania w portalu

Evaluation Criteria for Affect-Annotated Databases

Publikacja

- Rok 2015

In this paper a set of comprehensive evaluation criteria for affect-annotated databases is proposed. These criteria can be used for evaluation of the quality of a database on the stage of its creation as well as for evaluation and comparison of existing databases. The usefulness of these criteria is demonstrated on several databases selected from affect computing domain. The databases contain different kind of data: video or still...

Pełny tekst do pobrania w serwisie zewnętrznym

Selection of Features for Multimodal Vocalic Segments Classification

Publikacja

- Rok 2018

English speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the inﬂuence of feature vector dimension reduction for the accuracy of vocalic segments classiﬁcation employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...

Pełny tekst do pobrania w serwisie zewnętrznym

Analyzing the relationship between sound, color, and emotion based on subjective and machine-learning approaches

Publikacja

- Rok 2024

The aim of the research is to analyze the relationship between sound, color, and emotion. For this purpose, a survey application was prepared, enabling the assignment of a color to a given speaker’s/singer’s voice recordings. Subjective tests were then conducted, enabling the respondents to assign colors to voice/singing samples. In addition, a database of voice/singing recordings of people speaking in a natural way and with expressed...

Pełny tekst do pobrania w portalu

Determining Pronunciation Differences in English Allophones Utilizing Audio Signal Parameterization

Publikacja

- Rok 2017

An allophonic description of English plosive consonants, based on audio-visual recordings of 600 specially selected words, was developed. First, several speakers were recorded while reading words from a teleprompter. Then, every word was played back from the previously recorded sample read by a phonology expert and each examined speaker repeated a particular word trying to imitate correct pronunciation. The next step consisted...

Cross-domain applications of multimodal human-computer interfaces

Publikacja

A. Czyżewski

- Rok 2015

Developed multimodal interfaces for education applications and for disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and audio interface for speech stretching for hearing impaired and stuttering people and intelligent pen allowing for diagnosing and ameliorating developmental dyslexia. The eye-gaze tracking system named...

Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering

Publikacja

- IEEE Transactions on Audio Speech and Language Processing - Rok 2015

This paper presents a new approach to elimination of impulsive disturbances from stereo audio recordings. The proposed solution is based on vector autoregressive modeling of audio signals. Online tracking of signal model parameters is performed using the exponential ly weighted least squares algo- rithm. Detection of noise pulses an d model-based interpolation of the irrevocably distorted sampl es is realized using an adaptive, variable-order...

Pełny tekst do pobrania w portalu

Elimination of Impulsive Disturbances From Archive Audio Signals Using Bidirectional Processing

Publikacja

- IEEE Transactions on Audio Speech and Language Processing - Rok 2013

In this application-oriented paper we consider the problem of elimination of impulsive disturbances, such as clicks, pops and record scratches, from archive audio recordings. The proposed approach is based on bidirectional processing—noise pulses are localized by combining the results of forward-time and backward-time signal analysis. Based on the results of specially designed empirical tests (rather than on the results of theoretical analysis),...

Pełny tekst do pobrania w portalu

Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders

Publikacja

D. Koszewski
T. Görne
G. Korvel
B. Kostek

- EURASIP Journal on Audio Speech and Music Processing - Rok 2023

The purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods for automatic audio mixing first. Then, a novel deep model based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. The model is trained on a custom-prepared database. Mixes created using the...

Pełny tekst do pobrania w portalu

Filtry

Katalog

Kategoria

Rok

Opcje

Constructing a Dataset of Speech Recordingswith Lombard Effect

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

An audio-visual corpus for multimodal automatic speech recognition

Detecting Lombard Speech Using Deep Learning Approach

Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

Audio-visual aspect of the Lombard effect and comparison with recordings depicting emotional states.

Improving the quality of speech in the conditions of noise and interference

Multimodal English corpus for automatic speech recognition

Visual Lip Contour Detection for the Purpose of Speech Recognition

Analysis of allophones based on audio signal recordings and parameterization

Ranking Speech Features for Their Usage in Singing Emotion Classification

KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY

A survey of automatic speech recognition deep models performance for Polish medical terms

Database of speech and facial expressions recorded with optimized face motion capture settings

Objectivization of phonological evaluation of speech elements by means of audio parametrization

Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

Noise profiling for speech enhancement employing machine learning models

Evaluation Criteria for Affect-Annotated Databases

Selection of Features for Multimodal Vocalic Segments Classification

Analyzing the relationship between sound, color, and emotion based on subjective and machine-learning approaches

Determining Pronunciation Differences in English Allophones Utilizing Audio Signal Parameterization

Cross-domain applications of multimodal human-computer interfaces

Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering

Elimination of Impulsive Disturbances From Archive Audio Signals Using Bidirectional Processing

Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders

Wyszukiwarka

Filtry

Katalog

Kategoria

Rok

Opcje

Wyniki wyszukiwania dla: speech recordings