Search results for: IMAGINED SPEECH

Visual Lip Contour Detection for the Purpose of Speech Recognition

Publication

- Year 2014

A method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...

A Method of Real-Time Non-uniform Speech Stretching

Publication

- Year 2012

Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

Full text to download in external service

Examining Influence of Distance to Microphone on Accuracy of Speech Recognition

Publication

- Year 2015

The problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...

Full text to download in external service

Comparison of various speech time-scale modificartion methods

Publication

- Archives of Acoustics - Year 2011

The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

An audio-visual corpus for multimodal automatic speech recognition

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017

review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

Full text available to download

New approach to localization of clicks in archive speech signals.

Publication

- Year 2004

Przedstawiono problem lokalizacji zniekształceń impulsowych w archiwalnych sygnałach mowy. Pokazano, że detekcja oparta na dwuzakresowym modelu autoregresyjnym i przetwarzanie dwukierunkowe pozwala uzyskać znaczącą poprawę działania w stosunku do istniejących metod lokalizacji zniekształceń.

Advanced speech archiving and restoration system for aviation applications

Publication

A. Czyżewski
J. Kotus
A. Kaczmarek
A. Rypulak
A. Pawlik

- Year 2005

W referacie przedstawiono opracowany System Rejestracji I Rekonstrukcji Mowy dla potrzeb lotnictwa. System ten umożliwia jednoczesny zapis, archiwizację i poprawę zrozumiałości sygnału mowy pochodzącego z wielu różnych kanałów komunikacji radiowej. Głównym celem systemu jest rejestracja i rekonstrukcja komunikatów słownych wymienianych drogą radiową pomiędzy pilotem samolotu a stacją kontroli lotów - jest to niezwykle istotne w...

Application of hybrid signals processors to speech and hearing aids

Publication

- Year 2005

Dzięki postępowi w technice Cyfrowych Procesorów Sygnałowych (ang. DSP) stało się możliwe budowanie miniaturowych protez słuchu i mowy. Mimo niewielkich wymiarów procesory te są w stanie wykonywać złożone algorytmy. Ich dodatkową zaletą jest łatwość zmiany oprogramowania, a co za tym idzie łatwość zmiany dziedziny zastosowań. W pracy skupiono się na zagadnieniach związanych z projektowanie i implementacją algorytmów mających zastosowanie...

A Mechatronic System for Building the Map of Optimal Spindle Speeds During High Speed Milling of Flexible Details

Publication

- Year 2011

W pracy przedstawiono mechatroniczny system tworzenia mapy optymalnych prędkości obrotowych wrzeciona w celu nadzorowania drgań typu chatter podczas frezowania szybkościowego przedmiotów podatnych. System ten składa się z części pomiarowej oraz części obliczeniowej, w której wykorzystuje się oprogramowanie autorskie i komercyjne. Na bazie uogólnionego warunku Liao-Younga utworzono mapy optymalnych prędkości obrotowych wrzeciona,...

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

Publication

D. Korzekwa
R. Barra-Chicote
B. Kostek
T. Drugman
M. Łajszczak

- Year 2019

We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

Full text available to download

Objectivization of phonological evaluation of speech elements by means of audio parametrization

Publication

- Year 2018

This study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...

Speech codec enhancements utilizing time compression and perceptual coding

Publication

- Year 2007

A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

A hybrid speech codec employing parametric and perceptual coding techniques

Publication

- Year 2006

W referacie przedstawiono hybrydowy kodek mowy dla zastosowan w komunikacji VoIP wykorzystujący kodowanie parametryczne i percetualne. Sygnał mowy jest dzielony na składowe dźwięczne, które podlegają kodowania perceptualnemu, składowe bezdźwięczne, które kodowane są metodą parametryczną oraz transjenty, które nie są kodowane żadną stratną metodą. Dodatkowo przedstawiono architekturę kodeka, w której perceptualnie kodowana i przesyłana...

Auditory-model based robust feature selection for speech recognition

Publication

C. Koniaris
M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- Journal of the Acoustical Society of America - Year 2010

Full text to download in external service

Virtual keyboard controlled by eye gaze employing speech synthesis

Publication

- Year 2010

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Real-time speech streching for supporting hearing impaired schoolchildren

Publication

- Elektronika : konstrukcje, technologie, zastosowania - Year 2010

A study of time scale modification algorithms applied to support hearing impaired schoolchildren is presented. Variety of algorithms are considered, namely: overlap-and add, two variations of synchronous overlapand- add, and the phase vocoder. Their effectiveness as well as real-time processing capabilities are examined.

Full text to download in external service

Noise profiling for speech enhancement employing machine learning models

Publication

K. Kąkol
G. Korvel
B. Kostek

- Journal of the Acoustical Society of America - Year 2022

This paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...

Full text available to download

System Supporting Speech Perception in Special Educational Needs Schoolchildren

Publication

- Year 2012

The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

Full text to download in external service

Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

Publication

- Diagnostic Pathology - Year 2012

Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

Full text available to download

Human-computer interactions in speech therapy using a blowing interface

Publication

- Year 2014

In this paper we present a new human-computer interface for the quantitative measurement of blowing activities. The interface can measure the air flow and air pressure during the blowing activity. The measured values are stored and used to control the state of the graphical objects in the graphical user interface. In speech therapy children will find easier to play attractive therapeutic games than to perform repetitive and tedious,...

Full text to download in external service

Distortion of speech signals in the listening area: its mechanism and measurements

Publication

- Year 2014

The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

Full text to download in external service

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

Publication

- Year 2016

Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

Full text to download in external service

Automatic prosodic modification in a Text-To-Speech synthesizer of Polish language

Publication

K. Łopatka
P. Suchomski
A. Czyżewski

- Elektronika : konstrukcje, technologie, zastosowania - Year 2011

Przedstawiono system syntezy mowy polskiej z funkcją automatycznej modyfikacji prozodii wypowiedzi. Opisane zostały metody automatycznego wyznaczania akcentu i intonacji wypowiedzi. Przedstawiono zastosowanie algorytmów przetwarzania sygnału mowy w procesie kształtowania prozodii. Omówiono wpływ zastosowanych modyfikacji na naturalność brzmienia syntezowanego sygnału. Zastosowana metoda oparta jest na algorytmie TD-PSOLA. Opracowany...

Virtual Keyboard controlled by eye gaze employing speech synthesis

Publication

- Elektronika : konstrukcje, technologie, zastosowania - Year 2011

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Full text to download in external service

Digital Imaging and Communication in Medicine Standard for Breast Thermal Imaging

Publication

J. Rumiński

- Year 2010

Omówiono znaczenie normy DICOM w obrazowaniu medycznym. Zaproponowano metodę opisu danych w diagnostyce terminczej piersi dla potrzeb normy DICOM. Zaprojektowano i wykonano algorytmy automatycznej konwersji danych oraz tworzenia struktur danych w hierarchiczno-obiektowym modelu danych.

An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

Publication

G. Korvel
O. Kurasova
B. Kostek

- Year 2019

The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

Full text available to download

Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition

Publication

G. Korvel
P. Treigys
G. Tamulevicus
J. Bernataviciene
B. Kostek

- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Year 2018

convolutional neural network (CNN) which is a class of deep, feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word...

Time-scale modification of speech signals for supporting hearing impaired schoolchildren

Publication

- Year 2009

A study of time scale modification algorithmsapplied to hearing impaired schoolchildren supporting ispresented. Variety of algorithms are considered, namely:overlap and add, two variations of synchronized overlapand add, and the phase vocoder. Their effectiveness as wellas real-time processing capabilities are examined.

Speech formant frequency and pitch estimation using instantaneous complex frequency

Publication

M. [. Kaniewska

- Year 2008

W pracy opisany został algorytm estymacji częstotliwości podstawowej oraz częstotliwości środkowych i pasm formantów mowy z wykorzystaniem zespolonej pulsacji chwilowej. W artykule przedstawiono również wyniki działania algorytmu dla polskich samogłosek.

High quality speech codec employing sines+noise+transients model

Publication

- Archives of Acoustics - Year 2006

A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

Full text available to download

Estimation of the short-term predictor parameters of speech under noisy conditions

Publication

M. Kuropatwinski
W. Kleijn
M. Kuropatwiński

- IEEE Transactions on Audio Speech and Language Processing - Year 2006

Full text to download in external service

Corrupted speech intelligibility improvement using adaptive filter based algorithm

Publication

- Year 2010

A technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.

SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM

Publication

- Journal of the Acoustical Society of America - Year 2023

The main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...

Full text available to download

Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

Publication

- Electronics - Year 2022

Artificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...

Full text available to download

Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

Publication

G. Korvel
K. Kąkol
P. Treigys
B. Kostek

- Year 2022

The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

Full text available to download

A non-uniform real-time speech time-scale stretching method

Publication

- Year 2011

An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...

Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

Publication

- Year 2016

The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...

Pursuing the Deep-Learning-Based Classification of Exposed and Imagined Colors from EEG

Publication

A. A. Torres-García
J. S. Garcia Salinas
L. Villaseñor-Pineda

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2022

EEG-based brain-computer interfaces are systems aiming to integrate disabled people into their environments. Nevertheless, their control could not be intuitive or depend on an active external stimulator to generate the responses for interacting with it. Targeting the second issue, a novel paradigm is explored in this paper, which depends on a passive stimulus by measuring the EEG responses of a subject to the primary colors (red,...

Full text to download in external service

Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System

Publication

M. Zamłyńska
P. Falkowski-Gilski
G. Debita
B. Miedziński

- Year 2021

Although there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...

Full text to download in external service

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Publication

G. Tamulevicius
G. Korvel
A. B. Yayak
P. Treigys
J. Bernataviciene
B. Kostek

- Electronics - Year 2020

In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...

Full text available to download

Database of speech and facial expressions recorded with optimized face motion capture settings

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2019

The broad objective of the present research is the analysis of spoken English employing a multiplicity of modalities. An important stage of this process, discussed in the paper, is creating a database of speech accompanied with facial expressions. Recordings of speakers were made using an advanced system for capturing facial muscle motion. A brief historical outline, current applications, limitations and the ways of capturing face...

Full text available to download

Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

Publication

K. Kąkol
G. Korvel
B. Kostek

- Year 2018

The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

Publication

- Year 2007

In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

Publication

- Elektronika : konstrukcje, technologie, zastosowania - Year 2008

In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

Improving signal quality of a speech codec using hybrid perceptual-parametric algorithm

Publication

- International Journal of Intelligent Information and Database Systems - Year 2008

W artykule zaprezentowano hybrydową architekturę parametryczno-perceptualną kodeka mowy. Jego podstawę stanowi kodek CELP, który wspomagany jest kodekiem perceptualnym. Celem zastosowania proponowanej metody jest uzyskanie poprawy jakości kodowania sygnału mowy. Badaniom poddano dwie architektury, z których w jednej dźwięczne części sygnału rezydualnego kodeka CELP kodowane są perceptualnie. Drugi z proponowanych kodeków dokonuje...

Full text to download in external service

A survey of automatic speech recognition deep models performance for Polish medical terms

Publication

- Year 2023

Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....

Full text to download in external service

Computer-assisted pronunciation training—Speech synthesis is almost all you need

Publication

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
B. Kostek

- SPEECH COMMUNICATION - Year 2022

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

Full text available to download

Elimination of clicks from archive speech signals using sparse autoregressive modeling

Publication

- Year 2012

This paper presents a new approach to elimination of impulsivedisturbances from archive speech signals. The proposedsparse autoregressive (SAR) signal representation is given ina factorized form - the model is a cascade of the so-called formantfilter and pitch filter. Such a technique has been widelyused in code-excited linear prediction (CELP) systems, as itguarantees model stability. After detection of noise pulses usinglinear...

Full text to download in external service

Combining visual and acoustic modalities to ease speech recognition by hearing impaired people

Publication

- Year 2005

Artykuł prezentuje system, którego celem działania jest ułatwienie procesu treningu poprawnej wymowy dla osób z poważnymi wadami słuchu. W analizie mowy wykorzystane zostały parametry akutyczne i wizualne. Do wyznaczenia parametrów wizualnych na podstawie kształtu i ruchu ust zostały wykorzystane modele Active Shape Models. Parametry akustyczne bazują na współczynnikach melcepstralnych. Do klasyfikacji wypowiadanych głosek została...

Documenting the de-identification process of clinical and imaging data for AI for health imaging projects

Publication

H. Kondylakis
R. Catalan
S. Alabart
C. Barelle
P. Bizopoulos
M. Bobowicz
J. Bona
D. Fotiadis
T. Garcia
I. Gomez... and 18 others

- Insights into Imaging - Year 2024

Full text to download in external service

Search

Filters

Catalog

Category

Year

Options

Search results for: IMAGINED SPEECH