Wyniki wyszukiwania dla: SPEECH SYNTHESIS

Wyniki wyszukiwania dla: SPEECH SYNTHESIS

wyników na stronę:
osadź ten widok na swojej stronie

Filtry

wszystkich: 213

wyczyść wszystkie filtry niedostępne

Speech formant frequency and pitch estimation using instantaneous complex frequency
Publikacja
- M. [. Kaniewska
- Rok 2008
W pracy opisany został algorytm estymacji częstotliwości podstawowej oraz częstotliwości środkowych i pasm formantów mowy z wykorzystaniem zespolonej pulsacji chwilowej. W artykule przedstawiono również wyniki działania algorytmu dla polskich samogłosek.
Time-scale modification of speech signals for supporting hearing impaired schoolchildren
Publikacja
- A. Kupryjanow
- A. Czyżewski
- Rok 2009
A study of time scale modification algorithmsapplied to hearing impaired schoolchildren supporting ispresented. Variety of algorithms are considered, namely:overlap and add, two variations of synchronized overlapand add, and the phase vocoder. Their effectiveness as wellas real-time processing capabilities are examined.
Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition
Publikacja
- G. Korvel
- P. Treigys
- G. Tamulevicus
- J. Bernataviciene
- B. Kostek
- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Rok 2018
convolutional neural network (CNN) which is a class of deep, feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word...
Bimodal classification of English allophones employing acoustic speech signal and facial motion capture
Publikacja
- Journal of the Acoustical Society of America - Rok 2018
A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

Pełny tekst do pobrania w serwisie zewnętrznym
Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System
Publikacja
- P. Falkowski-Gilski
- G. Debita
- M. Habrych
- B. Miedziński
- P. Jedlikowski
- B. Polnik
- J. Wandzio
- X. Wang
- Rok 2020
The broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...

Pełny tekst do pobrania w serwisie zewnętrznym
A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times
Publikacja
- SENSORS - Rok 2022
Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...

Pełny tekst do pobrania w portalu
Mowa nienawiści (hate speech) a odpowiedzialność dostawców usług internetowych w orzecznictwie sądów europejskich
Publikacja
- K. Kowalik-Bańczyk
- Rok 2015
The article analyses the phenomenon of hate speech in the Internet contrasted with the problem of responsability of Internet Service Providers for cases of such abuses of freedom of expression. The text provides an analysis of jurisprudence of two European Courts. On the one hand it presents the position of the European Court of Human Rights on the problem of hate speech: its definition and the liability for it as an exception...
Difference in Perceived Speech Signal Quality Assessment Among Monolingual and Bilingual Teenage Students
Publikacja
- P. Falkowski-Gilski
- Rok 2021
The user perceived quality is a mixture of factors, including the background of an individual. The process of auditory perception is discussed in a wide variety of fields, ranging from engineering to medicine. Many studies examine the difference between musicians and non-musicians. Since musical training develops musical hearing and other various auditory capabilities, similar enhancements should be observable in case of bilingual...

Pełny tekst do pobrania w serwisie zewnętrznym
Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition
Publikacja
- J. S. Garcia Salinas
- A. A. Torres-García
- C. A. Reyes-Garćia
- L. Villaseñor-Pineda
- Biomedical Signal Processing and Control - Rok 2023
Brain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....

Pełny tekst do pobrania w serwisie zewnętrznym
Improving signal quality of a speech codec using hybrid perceptual-parametric algorithm
Publikacja
- International Journal of Intelligent Information and Database Systems - Rok 2008
W artykule zaprezentowano hybrydową architekturę parametryczno-perceptualną kodeka mowy. Jego podstawę stanowi kodek CELP, który wspomagany jest kodekiem perceptualnym. Celem zastosowania proponowanej metody jest uzyskanie poprawy jakości kodowania sygnału mowy. Badaniom poddano dwie architektury, z których w jednej dźwięczne części sygnału rezydualnego kodeka CELP kodowane są perceptualnie. Drugi z proponowanych kodeków dokonuje...

Pełny tekst do pobrania w serwisie zewnętrznym
Combining visual and acoustic modalities to ease speech recognition by hearing impaired people
Publikacja
- B. Kostek
- P. Dalka
- Rok 2005
Artykuł prezentuje system, którego celem działania jest ułatwienie procesu treningu poprawnej wymowy dla osób z poważnymi wadami słuchu. W analizie mowy wykorzystane zostały parametry akutyczne i wizualne. Do wyznaczenia parametrów wizualnych na podstawie kształtu i ruchu ust zostały wykorzystane modele Active Shape Models. Parametry akustyczne bazują na współczynnikach melcepstralnych. Do klasyfikacji wypowiadanych głosek została...
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
Publikacja
- Rok 2014
The problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
Publikacja
- Rok 2014
The problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
Biometria i przetwarzanie mowy 2023
Kursy Online
- J. Daciuk
{mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...
Biometria i przetwarzanie mowy 2024
Kursy Online
- J. Daciuk
{mlang pl} Celem kursu jest zapoznanie studentów z: metodami ustalania i potwierdzania tożsamości ludzi na podstawie mierzalnych cech organizmu cechami mowy ludzkiej, w szczególności polskiej metodami rozpoznawania mowy metodami syntezy mowy {mlang} {mlang en} The aim of the course is to familiarize the students with: methods of identification and verification of identity of people based on measurable features of their...
Stochastic Integration and Long Term Predictor Estimation under Noisy Conditions for Speech Enhancement
Publikacja
- M. Kuropatwinski
- W. Kleijn
- M. Kuropatwiński
- Rok 2005
Pełny tekst do pobrania w serwisie zewnętrznym
Estimation of time-frequency complex phase-based speech attributes using narrow band filter banks
Publikacja
- K. Abratkiewicz
- K. Czarnecki
- D. Fourer
- F. Auger
- Rok 2017
In this paper, we present nonlinear estimators of nonstationary and multicomponent signal attributes (parameters, properties) which are instantaneous frequency, spectral (or group) delay, and chirp-rate (also known as instantaneous frequency slope). We estimate all of these distributions in the time-frequency domain using both finite and infinite impulse response (FIR and IIR) narrow band filers for speech analysis. Then, we present...

Pełny tekst do pobrania w portalu
Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine
Publikacja
- P. Falkowski-Gilski
- G. Debita
- Archives of Acoustics - Rok 2023
In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

Pełny tekst do pobrania w portalu
Akustyczny obraz słowa na tle mowy etnicznej [The acoustic image of ethnic speech words]
Publikacja
- K. Wojan
- Rok 2002
Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning
Publikacja
- K. Kąkol
- Rok 2023
The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

Pełny tekst do pobrania w portalu
The development of speech in early childhood in children from twin pregnancies with twin-twin transfusion syndrome (TTTS)
Publikacja
- M. Bidzan
- Ł. Bieleninik
- M. Lipowska
- Polish Psychological Bulletin - Rok 2013
Pełny tekst do pobrania w serwisie zewnętrznym
Minimum mean square error estimation of speech short-term predictor parameters under noisy conditions
Publikacja
- M. Kuropatwinski
- W. Kleijn
- M. Kuropatwiński
- Rok 2003
Pełny tekst do pobrania w serwisie zewnętrznym
Cyfrowa analiza mowy etnicznej – ekstrakcja kodu informacji [A digital analysis of ethnic speech – deciphering the information code]
Publikacja
- K. Wojan
- Rok 2003
Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej
Publikacja
- A. Czyżewski
- B. Kostek
- T. Ciszewski
- D. Majewicz
- Rok 2013
The bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...
Цифровой анализ сигналов речи как инструмент сравнительного языкознания [A digital analysis of speech signals as an instrument in comparative linguistics]
Publikacja
- K. Wojan
- Rok 2003
System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych = System of speech signal processing and visualisation of the results
Publikacja
- Z. Wojan
- W. Lis
- K. Wojan
- Rok 2005
W artykule przedstawiono sposób przetwarzania i wizualizacji sygnału mowy w formie prostego w obsłudze i relatywnie niedrogiego urządzenia do nagrywania sygnału akustycznego oraz przetwarzania cyfrowego wyselekcjonowanych fragmentów i wizualizacji uzyskanych rezultatów przekształceń. Zastosowano do tego celu komputer z kartą dźwiękową. Przetwarzanie cyfrowe oraz wizualizacja dokonywana była w oparciu o program MATLAB bezpośrednio...
System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych [A system of speech signal processing and visualisation for linguistic purposes]
Publikacja
- K. Wojan
- Rok 2005
Smartphone application supporting independent movement of the blind
Publikacja
- Rok 2011
Improving comfort of life of blind people is a problem of great importance. Neither a white canenor a guide dog, although both very useful, can be considered as a tool for achieving fullindependence in everyday movement around the city. On the market there are some navigation toolsinspired by car navigation systems, but they have many flaws, ranging from positioninginaccuracies to high prices. The authors present their own solution...
High quality speech coding using combined parametric and perceptual modules. [Kodowanie sygnału mowy z zachowaniem wysokiej jakości przy wykorzystaniu modułu parametrycznego i perceptualnego]
Publikacja
- Transaction on Engineering, Computation and Technology - Rok 2006
W komunikacie zaprezentowano nową metodę hybrydowego kodowania sygnału mowy. Techniki kodowania parametrycznego oraz perceptualnego zostały wykorzystane w celu zapewnienia wysokiej jakości kodowania sygnału mowy. Przedstawiono wyniki badań dla dwóch architektur kodeka. Jedna z nich bazuje na algorytmie pozwalajacym wyodrębnić składowe dźwięczne, bezdźwięczne oraz transjenty. Składowe dźwięczne kodowane są metodą perceptualną, bezdźwięczne...

Pełny tekst do pobrania w serwisie zewnętrznym
Improving signal quality in speech codec using hybrid perceptual-parametric algorithm. [Poprawa jakości sygnału w kodekach mowy przy użyciu hybrydowego, parametryczno-perceptualnego algorytmu kodowania]
Publikacja
- Rok 2006
Przedstawiono hybrydową, parametryczno-perceptualną architekturę kodeka. Podstawowa struktura kodeka parametrycznego CELP została wzbogacona o kodowanie perceptualne. Celem hybrydyzacji kodeka jest uzyskanie znaczącej poprawy subiektywnej jakości zdekodowanego sygnału. Zaproponowano dwie hybrydowe struktury. Pierwsza polega na perceptualnym kodowaniu dźwięcznych elementów sygnału rezydualnego kodeka CELP. Druga metoda dzieli sygnał...
Badanie rozkładów parametrów sygnału mowy w zastosowaniach do prognozowania prawdopodobieństwa popełnienia błędów w systemach identyfikacji mówców = Examining distribution of speech signal parameters for the prognosis of error probability in speaker verification systems
Publikacja
- A. Kaczmarek
- Rok 2010
Przedmiotem pracy jest system identyfikacji mówców w sposób zależny od tekstu ("text dependent''). Dokonano analizy wielu różnych wypowiedzi kilkudziesięciu mówców. Zastosowana metoda parametryzacji to metoda oparta na wynikach analizy cepstralnej sygnału mowy. Zdefiniowane zostały nowe parametry skojarzone z elementarnymi zdarzeniami w procesie weryfikacji mówców. Na tej podstawie dokonano estymacji funkcji gęstości prawdopodobieństwa...
Orken Mamyrbayev Professor

Osoby

1. Education: Higher. In 2001, graduated from the Abay Almaty State University (now Abay Kazakh National Pedagogical University), in the specialty: Computer science and computerization manager. 2. Academic degree: Ph.D. in the specialty "6D070300-Information systems". The dissertation was defended in 2014 on the topic: "Kazakh soileulerin tanudyn kupmodaldy zhuyesin kuru". Under my supervision, 16 masters, 1 dissertation...
Investigating Feature Spaces for Isolated Word Recognition
Publikacja
- G. Korvel
- G. Tamulevicus
- P. Treigys
- J. Bernataviciene
- B. Kostek
- Rok 2018
Much attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...
Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor
Publikacja
- Rok 2015
Spatial filtration of sound is introduced to enhance speech recognition accuracy in noisy conditions. An acoustic vector sensor (AVS) is employed. The signals from the AVS probe are processed in order to attenuate the surrounding noise. As a result the signal to noise ratio is increased. An experiment is featured in which speech signals are disturbed by babble noise. The signals before and after spatial filtration are processed...

Pełny tekst do pobrania w serwisie zewnętrznym
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention
Publikacja
- D. Korzekwa
- R. Barra-Chicote
- S. Zaporowski
- G. Beringer
- J. Lorenzo-trueba
- A. Serafinowicz
- J. Droppo
- T. Drugman
- B. Kostek
- Rok 2021
This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

Pełny tekst do pobrania w portalu
Zastosowanie spowalniania wypowiedzi w celu poprawy rozumienia mowy przez dzieci w szkole
Publikacja
- A. Kupryjanow
- A. Czyżewski
- Rok 2009
This paper presents a time-scale modification algorithms that could be used for hearing impairment therapy supported by real-time speech stretching. In this paper the OLA based algorithms and Phase Vocoder were described. In the experimental part usability of those algorithms for real-time speech stretching was discussed
Instantaneous complex frequency for pipeline pitch estimation
Publikacja
- M. [. Kaniewska
- Rok 2010
In the paper a pipeline algorithm for estimating the pitch of speech signal is proposed. The algorithm uses instantaneous complex frequencies estimated for four waveforms obtained by filtering the original speech signal through four bandpass complex Hilbert filters. The imaginary parts of ICFs from each channel give four candidates for pitch estimates. The decision regarding the final estimate is made based on the real parts of...
XVIII Międzynarodowe Sympozjum Inżynierii i Reżyserii Dźwięku
Publikacja
- P. Falkowski-Gilski
- S. Brachmański
- A. Dobrucki
- M. Kin
- Rok 2021
The subjective assessment of speech signals takes into account previous experiences and habits of an individual. Since the perception process deteriorates with age, differences should be noticeable among people from dissimilar age groups. In this work, we investigated the difference of speech quality assessment between high school students and university students. The study involved 60 participants, with 30 people in both the adolescents...

Pełny tekst do pobrania w serwisie zewnętrznym
PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS
Publikacja
- I. Kochańska
- H. Lasota
- Rok 2015
The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...
Human voice modification using instantaneous complex frequency
Publikacja
- M. Kaniewska
- Rok 2010
The paper presents the possibilities of changing human voice by modifying instantaneous complex frequency (ICF) of the speech signal. The proposed method provides a flexible way of altering voice without the necessity of finding fundamental frequency and formants' positions or detecting voiced and unvoiced fragments of speech. The algorithm is simple and fast. Apart from ICF it uses signal factorization into two factors: one fully...
Investigating Feature Spaces for Isolated Word Recognition
Publikacja
- P. Treigys
- G. Korvel
- G. Tamulevicius
- J. Bernataviciene
- B. Kostek
- Rok 2020
The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...

Pełny tekst do pobrania w serwisie zewnętrznym
Auditory-visual attention stimulator
Publikacja
- Rok 2013
New approach to lateralization irregularities formation was proposed. The emphasis is put on the relationship between visual and auditory attention stimulation. In this approach hearing is stimulated using time scale modified speech and sight is stimulated by rendering the text of the currently heard speech. Moreover, displayed text is modified using several techniques i.e. zooming, highlighting etc. In the experimental part of...

Pełny tekst do pobrania w serwisie zewnętrznym
INVESTIGATION OF THE LOMBARD EFFECT BASED ON A MACHINE LEARNING APPROACH
Publikacja
- G. Korvel
- P. Treigys
- K. Kąkol
- B. Kostek
- International Journal of Applied Mathematics and Computer Science - Rok 2023
The Lombard effect is an involuntary increase in the speaker’s pitch, intensity, and duration in the presence of noise. It makes it possible to communicate in noisy environments more effectively. This study aims to investigate an efficient method for detecting the Lombard effect in uttered speech. The influence of interfering noise, room type, and the gender of the person on the detection process is examined. First, acoustic parameters...

Pełny tekst do pobrania w portalu
Audio-visual aspect of the Lombard effect and comparison with recordings depicting emotional states.
Publikacja
- Rok 2018
In this paper an analysis of audio-visual recordings of the Lombard effect is shown. First, audio signal is analyzed indicating the presence of this phenomenon in the recorded sessions. The principal aim, however, was to discuss problems related to extracting differences caused by the Lombard effect, present in the video , i.e. visible as tension and work of facial muscles aligned to an increase in the intensity of the articulated...

Pełny tekst do pobrania w serwisie zewnętrznym
Variable Ratio Sample Rate Conversion Based on Fractional Delay Filter
Publikacja
- M. Blok
- P. Drózda
- Archives of Acoustics - Rok 2014
In this paper a sample rate conversion algorithm which allows for continuously changing resampling ratio has been presented. The proposed implementation is based on a variable fractional delay filter which is implemented by means of a Farrow structure. Coefficients of this structure are computed on the basis of fractional delay filters which are designed using the offset window method. The proposed approach allows us to freely...

Pełny tekst do pobrania w portalu
Prof. Haitham Abu-Rub - A Visit to Poland's Gdansk University of Technology
Publikacja
- J. Guziński
- IEEE Industrial Electronics Magazine - Rok 2015
Report on visit of Prof. Haitham Abu-Rub in Gdansk University of Technology. Speech on the Smart Grid Centre. Visit in the new smart grid laboratory of the GUT, the Laboratory for Innovative Power Technologies and Integration of Renewable Energy Sources (LINTE^2).

Pełny tekst do pobrania w portalu
Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling
Publikacja
- D. Korzekwa
- J. Lorenzo-trueba
- S. Zaporowski
- S. Calamaro
- T. Drugman
- B. Kostek
- Rok 2021
A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result...

Pełny tekst do pobrania w serwisie zewnętrznym
A Comparison of STI Measured by Direct and Indirect Methods for Interiors Coupled with Sound Reinforcement Systems
Publikacja
- Rok 2018
This paper presents a comparison of STI (Speech Transmission Index) coefficient measurement results carried out by direct and indirect methods. First, acoustic parameters important in the context of public address and sound reinforcement systems are recalled. A measurement methodology is presented that employs various test signals to determine impulse responses. The process of evaluating sound system performance, signals enabling...

Pełny tekst do pobrania w serwisie zewnętrznym
Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set
Publikacja
- P. Filipowicz
- B. Kostek
- Applied Sciences-Basel - Rok 2023
This work deals with automatically detecting stuttering and its subclasses. An effective classification of stuttering along with its subclasses could find wide application in determining the severity of stuttering by speech therapists, preliminary patient diagnosis, and enabling communication with the previously mentioned voice assistants. The first part of this work provides an overview of examples of classical and deep learning...

Pełny tekst do pobrania w portalu
A comparative study of English viseme recognition methods and algorithms
Publikacja
- MULTIMEDIA TOOLS AND APPLICATIONS - Rok 2018
An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...

Pełny tekst do pobrania w portalu

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: SPEECH SYNTHESIS

Orken Mamyrbayev Professor