Wyniki wyszukiwania dla: SPEECH PROCESSING

Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

Publikacja

G. Korvel
K. Kąkol
O. Kurasova
B. Kostek

- IEEE Access - Rok 2020

The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

Pełny tekst do pobrania w portalu

Improving the quality of speech in the conditions of noise and interference

Publikacja

B. Kostek
K. Kąkol

- Journal of the Acoustical Society of America - Rok 2018

The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

Pełny tekst do pobrania w serwisie zewnętrznym

Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

Publikacja

K. Kąkol

- Rok 2023

The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

Pełny tekst do pobrania w portalu

Improving Objective Speech Quality Indicators in Noise Conditions

Publikacja

K. Kąkol
G. Korvel
B. Kostek

- Rok 2020

This work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...

Pełny tekst do pobrania w serwisie zewnętrznym

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

Publikacja

D. Korzekwa
R. Barra-Chicote
B. Kostek
T. Drugman
M. Łajszczak

- Rok 2019

We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

Pełny tekst do pobrania w portalu

Applying the Lombard Effect to Speech-in-Noise Communication

Publikacja

G. Korvel
K. Kąkol
P. Treigys
B. Kostek

- Electronics - Rok 2023

This study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...

Pełny tekst do pobrania w portalu

Constructing a Dataset of Speech Recordingswith Lombard Effect

Publikacja

D. Weber
S. Zaporowski
D. Korzekwa

- Rok 2020

Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

Improved method for real-time speech stretching

Publikacja

- Rok 2012

n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

Pełny tekst do pobrania w serwisie zewnętrznym

Speech codec enhancements utilizing time compression and perceptual coding

Publikacja

- Rok 2007

A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

Publikacja

- Rok 2016

The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...

Methodology and technology for the polymodal allophonic speech transcription

Publikacja

- Journal of the Acoustical Society of America - Rok 2016

A method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...

Pełny tekst do pobrania w serwisie zewnętrznym

Methodology and technology for the polymodal allophonic speech transcription

Publikacja

- Journal of the Acoustical Society of America - Rok 2016

A method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...

Pełny tekst do pobrania w serwisie zewnętrznym

Detecting Lombard Speech Using Deep Learning Approach

Publikacja

K. Kąkol
G. Korvel
G. Tamulevicius
B. Kostek

- SENSORS - Rok 2023

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

Pełny tekst do pobrania w portalu

Computer-assisted pronunciation training—Speech synthesis is almost all you need

Publikacja

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
B. Kostek

- SPEECH COMMUNICATION - Rok 2022

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

Pełny tekst do pobrania w portalu

Transient detection for speech coding applications

Publikacja

- International Journal of Computer Science and Network Security - Rok 2006

Signal quality in speech codecs may be improved by selecting transients from speech signal and encoding them using a suitable method. This paper presents an algorithm for transient detection in speech signal. This algorithm operates in several frequency bands. Transient detection functions are calculated from energy measured in short frames of the signal. The final selection of transient frames is based on results of detection...

Pełny tekst do pobrania w serwisie zewnętrznym

Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

Publikacja

- Diagnostic Pathology - Rok 2012

Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

Pełny tekst do pobrania w portalu

Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning

Publikacja

A. Czyżewski

- Journal of the Acoustical Society of America - Rok 2023

Text-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...

Pełny tekst do pobrania w portalu

A survey of automatic speech recognition deep models performance for Polish medical terms

Publikacja

- Rok 2023

Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....

Pełny tekst do pobrania w serwisie zewnętrznym

Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

Publikacja

- Rok 2016

Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

Pełny tekst do pobrania w serwisie zewnętrznym

High quality speech codec employing sines+noise+transients model

Publikacja

- Archives of Acoustics - Rok 2006

A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

Pełny tekst do pobrania w portalu

A Method of Real-Time Non-uniform Speech Stretching

Publikacja

- Rok 2012

Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

Pełny tekst do pobrania w serwisie zewnętrznym

Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling

Publikacja

D. Korzekwa
J. Lorenzo-trueba
S. Zaporowski
S. Calamaro
T. Drugman
B. Kostek

- Rok 2021

A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result...

Pełny tekst do pobrania w serwisie zewnętrznym

Mowa nienawiści (hate speech) a odpowiedzialność dostawców usług internetowych w orzecznictwie sądów europejskich

Publikacja

K. Kowalik-Bańczyk

- Rok 2015

The article analyses the phenomenon of hate speech in the Internet contrasted with the problem of responsability of Internet Service Providers for cases of such abuses of freedom of expression. The text provides an analysis of jurisprudence of two European Courts. On the one hand it presents the position of the European Court of Human Rights on the problem of hate speech: its definition and the liability for it as an exception...

An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

Publikacja

G. Korvel
O. Kurasova
B. Kostek

- Rok 2019

The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

Pełny tekst do pobrania w portalu

Automated detection of pronunciation errors in non-native English speech employing deep learning

Publikacja

D. Korzekwa

- Rok 2023

Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...

Pełny tekst do pobrania w portalu

Corrupted speech intelligibility improvement using adaptive filter based algorithm

Publikacja

- Rok 2010

A technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.

A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

Publikacja

G. Tamulevicius
G. Korvel
A. B. Yayak
P. Treigys
J. Bernataviciene
B. Kostek

- Electronics - Rok 2020

In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...

Pełny tekst do pobrania w portalu

Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System

Publikacja

P. Falkowski-Gilski
G. Debita
M. Habrych
B. Miedziński
P. Jedlikowski
B. Polnik
J. Wandzio
X. Wang

- Rok 2020

The broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...

Pełny tekst do pobrania w serwisie zewnętrznym

Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System

Publikacja

M. Zamłyńska
P. Falkowski-Gilski
G. Debita
B. Miedziński

- Rok 2021

Although there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...

Pełny tekst do pobrania w serwisie zewnętrznym

Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

Publikacja

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
S. Calamaro
B. Kostek

- Rok 2021

We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

Pełny tekst do pobrania w portalu

Examining Influence of Distance to Microphone on Accuracy of Speech Recognition

Publikacja

- Rok 2015

The problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...

Pełny tekst do pobrania w serwisie zewnętrznym

Speech Intelligibility Measurements in Auditorium

Publikacja

K. Leo

- ACTA PHYSICA POLONICA A - Rok 2010

Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

Pełny tekst do pobrania w portalu

Comparison of various speech time-scale modificartion methods

Publikacja

- Archives of Acoustics - Rok 2011

The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

A non-uniform real-time speech time-scale stretching method

Publikacja

- Rok 2011

An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...

Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor

Publikacja

- Rok 2015

Spatial filtration of sound is introduced to enhance speech recognition accuracy in noisy conditions. An acoustic vector sensor (AVS) is employed. The signals from the AVS probe are processed in order to attenuate the surrounding noise. As a result the signal to noise ratio is increased. An experiment is featured in which speech signals are disturbed by babble noise. The signals before and after spatial filtration are processed...

Pełny tekst do pobrania w serwisie zewnętrznym

Language Models in Speech Recognition

Publikacja

J. Daciuk

- Rok 2022

This chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.

Pełny tekst do pobrania w serwisie zewnętrznym

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

Publikacja

D. Korzekwa
R. Barra-Chicote
S. Zaporowski
G. Beringer
J. Lorenzo-trueba
A. Serafinowicz
J. Droppo
T. Drugman
B. Kostek

- Rok 2021

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

Pełny tekst do pobrania w portalu

Zastosowanie spowalniania wypowiedzi w celu poprawy rozumienia mowy przez dzieci w szkole

Publikacja

- Rok 2009

This paper presents a time-scale modification algorithms that could be used for hearing impairment therapy supported by real-time speech stretching. In this paper the OLA based algorithms and Phase Vocoder were described. In the experimental part usability of those algorithms for real-time speech stretching was discussed

Virtual keyboard controlled by eye gaze employing speech synthesis

Publikacja

- Rok 2010

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Virtual Keyboard controlled by eye gaze employing speech synthesis

Publikacja

- Elektronika : konstrukcje, technologie, zastosowania - Rok 2011

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Pełny tekst do pobrania w serwisie zewnętrznym

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

Publikacja

A. Czyżewski
B. Kostek
T. Ciszewski
D. Majewicz

- Rok 2013

The bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...

Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

Publikacja

- Elektronika : konstrukcje, technologie, zastosowania - Rok 2008

In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

Publikacja

- Rok 2007

In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

Distortion of speech signals in the listening area: its mechanism and measurements

Publikacja

- Rok 2014

The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

Pełny tekst do pobrania w serwisie zewnętrznym

Creating new voices using normalizing flows

Publikacja

P. Biliński
T. Merritt
A. Ezzerg
K. Pokora
S. Cygert
K. Yanagisawa
R. Barra-Chicote
D. Korzekwa

- Rok 2022

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

Pełny tekst do pobrania w portalu

PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS

Publikacja

- Rok 2015

The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...

Human voice modification using instantaneous complex frequency

Publikacja

M. Kaniewska

- Rok 2010

The paper presents the possibilities of changing human voice by modifying instantaneous complex frequency (ICF) of the speech signal. The proposed method provides a flexible way of altering voice without the necessity of finding fundamental frequency and formants' positions or detecting voiced and unvoiced fragments of speech. The algorithm is simple and fast. Apart from ICF it uses signal factorization into two factors: one fully...

Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems

Publikacja

- Pomiary Automatyka Robotyka - Rok 2013

The aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...

Pełny tekst do pobrania w portalu

Ranking Speech Features for Their Usage in Singing Emotion Classification

Publikacja

- Rok 2020

This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

Pełny tekst do pobrania w portalu

Investigating Feature Spaces for Isolated Word Recognition

Publikacja

P. Treigys
G. Korvel
G. Tamulevicius
J. Bernataviciene
B. Kostek

- Rok 2020

The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...

Pełny tekst do pobrania w serwisie zewnętrznym

Wyszukiwarka

Filtry

Katalog

Kategoria

Rok

Opcje

Wyniki wyszukiwania dla: SPEECH PROCESSING