Wyniki wyszukiwania dla: NEURAL TEXT-TO-SPEECH MULTILINGUAL SYNTHESIS VOICE CONVERSION SYNTHETIC DATA NORMALISING FLOWS

Wyniki wyszukiwania dla: NEURAL TEXT-TO-SPEECH MULTILINGUAL SYNTHESIS VOICE CONVERSION SYNTHETIC DATA NORMALISING FLOWS

wyników na stronę:
osadź ten widok na swojej stronie

Filtry

wszystkich: 11096

wyczyść wszystkie filtry niedostępne

wyświetlamy 1000 najlepszych wyników Pomoc

Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech
Publikacja
- D. Piotrowski
- R. Korzeniowski
- A. Falai
- S. Cygert
- K. Pokora
- G. Tinchev
- Z. Zhang
- K. Yanagisawa
- Rok 2023
In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

Pełny tekst do pobrania w serwisie zewnętrznym
Creating new voices using normalizing flows
Publikacja
- P. Biliński
- T. Merritt
- A. Ezzerg
- K. Pokora
- S. Cygert
- K. Yanagisawa
- R. Barra-Chicote
- D. Korzekwa
- Rok 2022
Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

Pełny tekst do pobrania w portalu
Computer-assisted pronunciation training—Speech synthesis is almost all you need
Publikacja
- D. Korzekwa
- J. Lorenzo-trueba
- T. Drugman
- B. Kostek
- SPEECH COMMUNICATION - Rok 2022
The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

Pełny tekst do pobrania w portalu
Orken Mamyrbayev Professor

Osoby

1. Education: Higher. In 2001, graduated from the Abay Almaty State University (now Abay Kazakh National Pedagogical University), in the specialty: Computer science and computerization manager. 2. Academic degree: Ph.D. in the specialty "6D070300-Information systems". The dissertation was defended in 2014 on the topic: "Kazakh soileulerin tanudyn kupmodaldy zhuyesin kuru". Under my supervision, 16 masters, 1 dissertation...
Time-domain prosodic modifications for text-to-speech synthesizer
Publikacja
- J. Łopatka
- P. Suchomski
- A. Czyżewski
- Rok 2010
An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.
Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition
Publikacja
- Rok 2016
The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...
Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning
Publikacja
- A. Czyżewski
- Journal of the Acoustical Society of America - Rok 2023
Text-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...

Pełny tekst do pobrania w portalu
SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM
Publikacja
- B. Kostek
- B. Szyca
- Journal of the Acoustical Society of America - Rok 2023
The main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...

Pełny tekst do pobrania w portalu
Comparison of the Ability of Neural Network Model and Humans to Detect a Cloned Voice
Publikacja
- Electronics - Rok 2023
The vulnerability of the speaker identity verification system to attacks using voice cloning was examined. The research project assumed creating a model for verifying the speaker’s identity based on voice biometrics and then testing its resistance to potential attacks using voice cloning. The Deep Speaker Neural Speaker Embedding System was trained, and the Real-Time Voice Cloning system was employed based on the SV2TTS, Tacotron,...

Pełny tekst do pobrania w portalu
Virtual keyboard controlled by eye gaze employing speech synthesis
Publikacja
- B. Kunka
- R. Rybacki
- K. Łopatka
- A. Czyżewski
- B. Kostek
- Rok 2010
The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...
Virtual Keyboard controlled by eye gaze employing speech synthesis
Publikacja
- K. Łopatka
- R. Rybacki
- B. Kunka
- A. Czyżewski
- B. Kostek
- Elektronika : konstrukcje, technologie, zastosowania - Rok 2011
The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

Pełny tekst do pobrania w serwisie zewnętrznym
Deep neural networks for data analysis
Kursy Online
- K. Draszawka
The aim of the course is to familiarize students with the methods of deep learning for advanced data analysis. Typical areas of application of these types of methods include: image classification, speech recognition and natural language understanding. Celem przedmiotu jest zapoznanie studentów z metodami głębokiego uczenia maszynowego na potrzeby zaawansowanej analizy danych. Do typowych obszarów zastosowań tego typu metod należą:...
Speech synthesis controlled by eye gazing
Publikacja
- A. Czyżewski
- K. Łopatka
- B. Kunka
- R. Rybacki
- B. Kostek
- Rok 2010
A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...
Magdalena Szuflita-Żurawska

Osoby

Politechnika Gdańska, Sekcja Informacji Naukowo-Technicznej, Biblioteka PG

Magdalena Szuflita-Żurawska jest kierownikiem Sekcji Informacji Naukowo-Technicznej na Politechnice Gdańskiej oraz Liderem Centrum Kompetencji Otwartej Nauki przy Bibliotece Politechniki Gdańskiej. Jej główne zainteresowania badawcze koncentrują się w obszarze komunikacji naukowej oraz otwartych danych badawczych, a także motywacji i produktywności naukowej. Jest odpowiedzialna między innymi za prowadzenie szkoleń dla pracowników...
Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine
Publikacja
- P. Falkowski-Gilski
- G. Debita
- Archives of Acoustics - Rok 2023
In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

Pełny tekst do pobrania w portalu
Interactive Information Search in Text Data Collections
Publikacja
- Rok 2013
This article presents a new idea for retrieving in text repositories, as well as it describes general infrastructure of a system created to implement and test those ideas. The implemented system differs from today’s standard search engine by introducing process of interactive search with users and data clustering. We present the basic algorithms behind our system and measures we used for results evaluation. The achieved results...

Pełny tekst do pobrania w serwisie zewnętrznym
Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets
Publikacja
- Electronics - Rok 2022
Artificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...

Pełny tekst do pobrania w portalu
Voice and Speech Review

Czasopisma

ISSN: 2326-8263 , eISSN: 2326-8271
Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems
Publikacja
- P. Sokólski
- T. A. Rutkowski
- Pomiary Automatyka Robotyka - Rok 2013
The aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...

Pełny tekst do pobrania w portalu
A Text as a Set of Research Data. A Number of Aspects of Data Acquisition and Creation of Datasets in Neo-Latin Studies
Publikacja
- Rok 2022
In this paper, the authors, who specialise in part in neo-Latin studies and the his-tory of early modern education, share their experiences of collecting sources for Open Research Data sets under the Bridge of Data project. On the basis of inscription texts from St. Mary’s Church in Gdańsk, they created 29 Open Research Data sets. In turn, the text of the lectures of the Gdańsk scholar Michael Christoph Hanow, Praecepta de arte...

Pełny tekst do pobrania w portalu
Anna Baj-Rogowska dr

Osoby

Katedra Informatyki w Zarządzaniu

Anna Baj-Rogowska zatrudniona jest na stanowisku adiunkta w Katedrze Informatyki w Zarządzaniu (Politechnika Gdańska, Wydział Zarządzania i Ekonomii). Jej wyższa edukacja związana jest z Uniwersytetem Gdańskim, gdzie ukończyła magisterskie studia informatyczne, studia doktoranckie i następnie uzyskała stopień naukowy doktora nauk ekonomicznych w zakresie nauk o zarządzaniu (Katedra Informatyki Ekonomicznej na Wydziale Zarządzania...
Deep neural networks for data analysis 24/25
Kursy Online
- J. Cychnerski
- K. Draszawka
This course covers introduction to supervised machine learning, construction of basic artificial deep neural networks (DNNs) and basic training algorithms, as well as the overview of popular DNNs architectures (convolutional networks, recurrent networks, transformers). The course introduces students to popular regularization techniques for deep models. Besides theory, large part of the course is the project in which students apply...
An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics
Publikacja
- G. Korvel
- O. Kurasova
- B. Kostek
- Rok 2019
The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

Pełny tekst do pobrania w portalu
Evaluation and Irony in Text in the Light of Speech Act Theory
Publikacja
- K. Kukowicz-Zarska
- Forum Filologiczne Ateneum - Rok 2020
Pełny tekst do pobrania w serwisie zewnętrznym
Secured wired BPL voice transmission system
Publikacja
- G. Debita
- P. Falkowski-Gilski
- M. Habrych
- B. Miedziński
- J. Wandzio
- P. Jedlikowski
- Scientific Journal of the Military University of Land Forces - Rok 2020
Designing a secured voice transmission system is not a trivial task. Wired media, thanks to their reliability and resistance to mechanical damage, seem an ideal solution. The BPL (Broadband over Power Line) cable is resistant to electricity stoppage and partial damage of phase conductors, ensuring continuity of transmission in case of an emergency. It seems an appropriate tool for delivering critical data, mostly clear and understandable...

Pełny tekst do pobrania w portalu
A survey of automatic speech recognition deep models performance for Polish medical terms
Publikacja
- Rok 2023
Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....

Pełny tekst do pobrania w serwisie zewnętrznym
Human voice modification using instantaneous complex frequency
Publikacja
- M. Kaniewska
- Rok 2010
The paper presents the possibilities of changing human voice by modifying instantaneous complex frequency (ICF) of the speech signal. The proposed method provides a flexible way of altering voice without the necessity of finding fundamental frequency and formants' positions or detecting voiced and unvoiced fragments of speech. The algorithm is simple and fast. Apart from ICF it uses signal factorization into two factors: one fully...
SYNTHESIS-STUTTGART

Czasopisma

ISSN: 0039-7881 , eISSN: 1437-210X
Olgun Aydin dr

Osoby

Katedra Statystyki i Ekonometrii

Olgun Aydin finished his PhD by publishing a thesis about Deep Neural Networks. He works as a Principal Machine Learning Engineer in Nike, and works as Assistant Professor in Gdansk University of Technology in Poland. Dr. Aydin is part of editorial board of "Journal of Artificial Intelligence and Data Science" Dr. Aydin served as Vice-Chairman of Why R? Foundation and is member of Polish Artificial Intelligence Society. Olgun is...
New approach for determining the QoS of MP3-coded voice signals in IP networks
Publikacja
- T. Uhl
- S. Paulsen
- K. Nowicki
- EURASIP Journal on Audio Speech and Music Processing - Rok 2017
Present-day IP transport platforms being what they are, it will never be possible to rule out conflicts between the available services. The logical consequence of this assertion is the inevitable conclusion that the quality of service (QoS) must always be quantifiable no matter what. This paper focuses on one method to determine QoS. It defines an innovative, simple model that can evaluate the QoS of MP3-coded voice data transported...

Pełny tekst do pobrania w portalu
Automatic prosodic modification in a Text-To-Speech synthesizer of Polish language
Publikacja
- K. Łopatka
- P. Suchomski
- A. Czyżewski
- Elektronika : konstrukcje, technologie, zastosowania - Rok 2011
Przedstawiono system syntezy mowy polskiej z funkcją automatycznej modyfikacji prozodii wypowiedzi. Opisane zostały metody automatycznego wyznaczania akcentu i intonacji wypowiedzi. Przedstawiono zastosowanie algorytmów przetwarzania sygnału mowy w procesie kształtowania prozodii. Omówiono wpływ zastosowanych modyfikacji na naturalność brzmienia syntezowanego sygnału. Zastosowana metoda oparta jest na algorytmie TD-PSOLA. Opracowany...
SYNTHETIC METALS

Czasopisma

ISSN: 0379-6779
Speaker Recognition Using Convolutional Neural Network with Minimal Training Data for Smart Home Solutions
Publikacja
- M. Wang
- T. Sirlapu
- A. Kwaśniewska
- M. Szankin
- M. Bartscherer
- R. Nicolas
- Rok 2018
With the technology advancements in smart home sector, voice control and automation are key components that can make a real difference in people's lives. The voice recognition technology market continues to involve rapidly as almost all smart home devices are providing speaker recognition capability today. However, most of them provide cloud-based solutions or use very deep Neural Networks for speaker recognition task, which are...

Pełny tekst do pobrania w serwisie zewnętrznym
Data set generation at novel test-rig for validation of numerical models for modeling granular flows
Publikacja
- A. Widuch
- K. Myöhänen
- M. Nikku
- M. L. Nowak
- A. Klimanek
- W. Adamczyk
- INTERNATIONAL JOURNAL OF MULTIPHASE FLOW - Rok 2021
Significant effort has been exerted on developing fast and reliable numerical models for modeling particulate flow; this is challenging owing to the complexity of such flows. To achieve this, reliable and high-quality experimental data are required for model development and validation. This study presents the design of a novel test-rig that allows the visualization and measurement of particle flow patterns during the collision...

Pełny tekst do pobrania w portalu
ENERGY CONVERSION AND MANAGEMENT

Czasopisma

ISSN: 0196-8904 , eISSN: 1879-2227
Improved method for real-time speech stretching
Publikacja
- A. Kupryjanow
- A. Czyżewski
- Rok 2012
n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

Pełny tekst do pobrania w serwisie zewnętrznym
SYNTHETIC COMMUNICATIONS

Czasopisma

ISSN: 0039-7911 , eISSN: 1532-2432
System for automatic singing voice recognition
Publikacja
- P. Żwan
- B. Kostek
- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Rok 2008
W artykule przedstawiono system automatycznego rozpoznawania jakości i typu głosu śpiewaczego. Przedstawiono bazę danych oraz zaimplementowane parametry. Algorytmem decyzyjnym jest algorytm sztucznych sieci neuronowych. Wytrenowany system decyzyjny osiąga skuteczność ok. 90% w obydwu kategoriach rozpoznawania. Dodatkowo wykazano przy pomocy metod statystycznych, że wyniki działania systemu automatycznej oceny jakości technicznej...
Preeclampsia Risk Prediction Using Machine Learning Methods Trained on Synthetic Data
Publikacja
- M. Mazur-Milecka
- N. Kowalczyk
- K. Jaguszewska
- D. Zamkowska
- D. Wójcik
- K. Preis
- H. Skov
- S. R. Wagner
- P. Sandager
- M. Sobotka
- J. Rumiński
- Rok 2024
This paper describes a research study that investigates the use of machine learning algorithms on synthetic data to classify the risk of developing preeclampsia by pregnant women. Synthetic datasets were generated based on parameter distributions from three real patient studies. Four models were compared: XGBoost, Support Vector Machine (SVM), Random Forest, and Explainable Boosting Machines (EBM). The study found that the XGBoost...

Pełny tekst do pobrania w serwisie zewnętrznym
Voice Multilateration System
Publikacja
- SENSORS - Rok 2021
This paper presents an innovative method of locating airplanes, which uses only voice communication between an air traffic controller and the pilot of an aircraft. The proposed method is described in detail along with its practical implementation in the form of a technology demonstrator (proof of concept), included in the voice communication system (VCS). A complete analysis of the performance of the developed method is presented,...

Pełny tekst do pobrania w portalu
COMPLEXITY OF INNOVATIVE FINANCIAL PRODUCTS: THE CASE OF SYNTHETIC EXCHANGE TRADED FUNDS IN EUROPE
Publikacja
- A. Marszk
- Nauki o Finansach. Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu - Rok 2016
The aim of the text is the presentation of the most important categories of exchange traded funds (ETFs) – physical and synthetic ones. A theoretical part of the text includes an overview of the main features of ETFs, the presentation of differences between physical and synthetic funds and the main risks posed by both types to their users and the whole financial systems. An empirical part focuses on the European market. Time span...

Pełny tekst do pobrania w portalu
Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement
Publikacja
- G. Korvel
- K. Kąkol
- O. Kurasova
- B. Kostek
- IEEE Access - Rok 2020
The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

Pełny tekst do pobrania w portalu
Automatic singing voice recognition employing neural networks and rough sets
Publikacja
- Rok 2007
Celem prac opisanych w referacie jest automatyczne rozpoznawanie głosów śpiewaczych. Do tego celu utworzona została baza nagrań próbek śpiewu profesjonalnego i amatorskiego. Próbki poddane zostały parametryzacji parametrami zaproponowanymi przez autorów ściśle do tego celu. Sposób wyznaczenia parametrów i ich interpretacja fizyczna przedstawione są w referacie. Parametry wprowadzane są do systemów decyzyjnych, klasyfikatorów opartych...
Speech Analytics Based on Machine Learning
Publikacja
- Rok 2019
In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

Pełny tekst do pobrania w serwisie zewnętrznym
3D seafloor reconstruction using data from side scan and synthetic aperture sonar
Publikacja
- K. Bikonis
- Z. Łubniewski
- HYDROACOUSTICS - Rok 2010
Side scan and synthetic aperture sonars are widely used imaging systems in the underwater environment. They are relatively cheap and easy to deploy, in comparison with more powerful sensors, like multibeam echosounders. Although side scan and synthetic aperture sonars does not provide seafloor bathymetry directly, their records are finally related to seafloor images. Moreover, the analysis of such images performed by human eye...

Pełny tekst do pobrania w portalu
Synthesis and Characterization of Poly(zwitterionic) Structures for Energy Conversion and Storage
Publikacja
- A. Olejnik
- S. Katarzyna
- K. Grochowska
- Rok 2021
Zwitterions are unique class of molecules that possess two functional groups bearing electric charges, one positive and second negative. This setup results in peculiar properties such as high water retention and anti-fouling capability. Therefore, zwitterionic coatings and gels are commonly applied in e.g. biosensing and bioelectronic devices. Despite those applications, there are other perspectives for zwitterionic materials....

Pełny tekst do pobrania w serwisie zewnętrznym
Language Models in Speech Recognition
Publikacja
- J. Daciuk
- Rok 2022
This chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.

Pełny tekst do pobrania w serwisie zewnętrznym
Ag modified ZnO microsphere synthesis for efficient sonophotocatalytic degradation of organic pollutants and CO2 conversion
Publikacja
- M. F. Khan
- S. u. H. Bakhtiar
- A. Zada
- F. Raziq
- H. A. Saleemi
- M. S. Khan
- P. M. Ismail
- A. C. Alguno
- R. Y. Capangpangan
- A. Ali... i 4 innych
- Environmental Nanotechnology, Monitoring and Management - Rok 2022
The synthesis and design of non-precious and efficient sonophotocatalyts by an environment friendly technique are requisites for solar energy conversion and environmental remediation. This work reports the preparation of Ag/ZnO microspheres with different Ag contents through deposition–precipitation method for pollutant degradation and CO2 conversion. Detail structural investigation reveals that ZnO microspheres and Ag-ZnO microspheres...

Pełny tekst do pobrania w serwisie zewnętrznym
Voice command recognition using hybrid genetic algorithm
Publikacja
- M. Wroniszewska
- J. Dziedzic
- TASK Quarterly - Rok 2010
Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...

Pełny tekst do pobrania w portalu
Automatic Image and Speech Recognition Based on Neural Network
Publikacja
- D. Król
- B. Szlachetko
- Journal of Information Technology Research - Rok 2010
Pełny tekst do pobrania w serwisie zewnętrznym

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: NEURAL TEXT-TO-SPEECH MULTILINGUAL SYNTHESIS VOICE CONVERSION SYNTHETIC DATA NORMALISING FLOWS

Orken Mamyrbayev Professor

Anna Baj-Rogowska dr

Olgun Aydin dr