Wyniki wyszukiwania dla: NEURAL TEXT-TO-SPEECH MULTILINGUAL SYNTHESIS VOICE CONVERSION SYNTHETIC DATA NORMALISING FLOWS - MOST Wiedzy

Wyszukiwarka

Wyniki wyszukiwania dla: NEURAL TEXT-TO-SPEECH MULTILINGUAL SYNTHESIS VOICE CONVERSION SYNTHETIC DATA NORMALISING FLOWS

Wyniki wyszukiwania dla: NEURAL TEXT-TO-SPEECH MULTILINGUAL SYNTHESIS VOICE CONVERSION SYNTHETIC DATA NORMALISING FLOWS

  • Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech

    Publikacja
    • D. Piotrowski
    • R. Korzeniowski
    • A. Falai
    • S. Cygert
    • K. Pokora
    • G. Tinchev
    • Z. Zhang
    • K. Yanagisawa

    - Rok 2023

    In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Creating new voices using normalizing flows

    Publikacja
    • P. Biliński
    • T. Merritt
    • A. Ezzerg
    • K. Pokora
    • S. Cygert
    • K. Yanagisawa
    • R. Barra-Chicote
    • D. Korzekwa

    - Rok 2022

    Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

    Pełny tekst do pobrania w portalu

  • Computer-assisted pronunciation training—Speech synthesis is almost all you need

    Publikacja

    - SPEECH COMMUNICATION - Rok 2022

    The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

    Pełny tekst do pobrania w portalu

  • Orken Mamyrbayev Professor

    Osoby

    1.  Education: Higher. In 2001, graduated from the Abay Almaty State University (now Abay Kazakh National Pedagogical University), in the specialty: Computer science and computerization manager. 2.  Academic degree: Ph.D. in the specialty "6D070300-Information systems". The dissertation was defended in 2014 on the topic: "Kazakh soileulerin tanudyn kupmodaldy zhuyesin kuru". Under my supervision, 16 masters, 1 dissertation...

  • Time-domain prosodic modifications for text-to-speech synthesizer

    Publikacja

    - Rok 2010

    An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

  • Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

    Publikacja

    The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...

  • Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning

    Text-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...

    Pełny tekst do pobrania w portalu

  • SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM

    The main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...

    Pełny tekst do pobrania w portalu

  • Comparison of the Ability of Neural Network Model and Humans to Detect a Cloned Voice

    The vulnerability of the speaker identity verification system to attacks using voice cloning was examined. The research project assumed creating a model for verifying the speaker’s identity based on voice biometrics and then testing its resistance to potential attacks using voice cloning. The Deep Speaker Neural Speaker Embedding System was trained, and the Real-Time Voice Cloning system was employed based on the SV2TTS, Tacotron,...

    Pełny tekst do pobrania w portalu

  • Deep neural networks for data analysis

    Kursy Online
    • K. Draszawka

    The aim of the course is to familiarize students with the methods of deep learning for advanced data analysis. Typical areas of application of these types of methods include: image classification, speech recognition and natural language understanding. Celem przedmiotu jest zapoznanie studentów z metodami głębokiego uczenia maszynowego na potrzeby zaawansowanej analizy danych. Do typowych obszarów zastosowań tego typu metod należą:...

  • Virtual Keyboard controlled by eye gaze employing speech synthesis

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Virtual keyboard controlled by eye gaze employing speech synthesis

    Publikacja

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

  • Speech synthesis controlled by eye gazing

    Publikacja

    A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

  • Magdalena Szuflita-Żurawska

    Magdalena Szuflita-Żurawska jest kierownikiem Sekcji Informacji Naukowo-Technicznej na Politechnice Gdańskiej oraz Liderem Centrum Kompetencji Otwartej Nauki przy Bibliotece Politechniki Gdańskiej. Jej główne zainteresowania badawcze koncentrują się w obszarze komunikacji naukowej oraz otwartych danych badawczych, a także motywacji i produktywności naukowej. Jest odpowiedzialna między innymi za prowadzenie szkoleń dla pracowników...

  • Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine

    Publikacja

    In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

    Pełny tekst do pobrania w portalu

  • Interactive Information Search in Text Data Collections

    Publikacja

    This article presents a new idea for retrieving in text repositories, as well as it describes general infrastructure of a system created to implement and test those ideas. The implemented system differs from today’s standard search engine by introducing process of interactive search with users and data clustering. We present the basic algorithms behind our system and measures we used for results evaluation. The achieved results...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

    Artificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...

    Pełny tekst do pobrania w portalu

  • Voice and Speech Review

    Czasopisma

    ISSN: 2326-8263 , eISSN: 2326-8271

  • Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems

    The aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...

    Pełny tekst do pobrania w portalu

  • A Text as a Set of Research Data. A Number of Aspects of Data Acquisition and Creation of Datasets in Neo-Latin Studies

    Publikacja

    In this paper, the authors, who specialise in part in neo-Latin studies and the his-tory of early modern education, share their experiences of collecting sources for Open Research Data sets under the Bridge of Data project. On the basis of inscription texts from St. Mary’s Church in Gdańsk, they created 29 Open Research Data sets. In turn, the text of the lectures of the Gdańsk scholar Michael Christoph Hanow, Praecepta de arte...

    Pełny tekst do pobrania w portalu

  • An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

    Publikacja

    - Rok 2019

    The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

    Pełny tekst do pobrania w portalu

  • Evaluation and Irony in Text in the Light of Speech Act Theory

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Secured wired BPL voice transmission system

    Publikacja

    - Scientific Journal of the Military University of Land Forces - Rok 2020

    Designing a secured voice transmission system is not a trivial task. Wired media, thanks to their reliability and resistance to mechanical damage, seem an ideal solution. The BPL (Broadband over Power Line) cable is resistant to electricity stoppage and partial damage of phase conductors, ensuring continuity of transmission in case of an emergency. It seems an appropriate tool for delivering critical data, mostly clear and understandable...

    Pełny tekst do pobrania w portalu

  • A survey of automatic speech recognition deep models performance for Polish medical terms

    Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Human voice modification using instantaneous complex frequency

    Publikacja
    • M. Kaniewska

    - Rok 2010

    The paper presents the possibilities of changing human voice by modifying instantaneous complex frequency (ICF) of the speech signal. The proposed method provides a flexible way of altering voice without the necessity of finding fundamental frequency and formants' positions or detecting voiced and unvoiced fragments of speech. The algorithm is simple and fast. Apart from ICF it uses signal factorization into two factors: one fully...

  • New approach for determining the QoS of MP3-coded voice signals in IP networks

    Publikacja

    Present-day IP transport platforms being what they are, it will never be possible to rule out conflicts between the available services. The logical consequence of this assertion is the inevitable conclusion that the quality of service (QoS) must always be quantifiable no matter what. This paper focuses on one method to determine QoS. It defines an innovative, simple model that can evaluate the QoS of MP3-coded voice data transported...

    Pełny tekst do pobrania w portalu

  • Automatic prosodic modification in a Text-To-Speech synthesizer of Polish language

    Przedstawiono system syntezy mowy polskiej z funkcją automatycznej modyfikacji prozodii wypowiedzi. Opisane zostały metody automatycznego wyznaczania akcentu i intonacji wypowiedzi. Przedstawiono zastosowanie algorytmów przetwarzania sygnału mowy w procesie kształtowania prozodii. Omówiono wpływ zastosowanych modyfikacji na naturalność brzmienia syntezowanego sygnału. Zastosowana metoda oparta jest na algorytmie TD-PSOLA. Opracowany...

  • SYNTHESIS-STUTTGART

    Czasopisma

    ISSN: 0039-7881 , eISSN: 1437-210X

  • Speaker Recognition Using Convolutional Neural Network with Minimal Training Data for Smart Home Solutions

    Publikacja

    - Rok 2018

    With the technology advancements in smart home sector, voice control and automation are key components that can make a real difference in people's lives. The voice recognition technology market continues to involve rapidly as almost all smart home devices are providing speaker recognition capability today. However, most of them provide cloud-based solutions or use very deep Neural Networks for speaker recognition task, which are...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Data set generation at novel test-rig for validation of numerical models for modeling granular flows

    Publikacja
    • A. Widuch
    • K. Myöhänen
    • M. Nikku
    • M. L. Nowak
    • A. Klimanek
    • W. Adamczyk

    - INTERNATIONAL JOURNAL OF MULTIPHASE FLOW - Rok 2021

    Significant effort has been exerted on developing fast and reliable numerical models for modeling particulate flow; this is challenging owing to the complexity of such flows. To achieve this, reliable and high-quality experimental data are required for model development and validation. This study presents the design of a novel test-rig that allows the visualization and measurement of particle flow patterns during the collision...

    Pełny tekst do pobrania w portalu

  • Olgun Aydin dr

    Olgun Aydin finished his PhD by publishing a thesis about Deep Neural Networks. He works as a Principal Machine Learning Engineer in Nike, and works as Assistant Professor in Gdansk University of Technology in Poland. Dr. Aydin is part of editorial board of "Journal of Artificial Intelligence and Data Science" Dr. Aydin served as Vice-Chairman of Why R? Foundation and is member of Polish Artificial Intelligence Society. Olgun is...

  • SYNTHETIC METALS

    Czasopisma

    ISSN: 0379-6779

  • Improved method for real-time speech stretching

    Publikacja

    n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Preeclampsia Risk Prediction Using Machine Learning Methods Trained on Synthetic Data

    Publikacja

    - Rok 2024

    This paper describes a research study that investigates the use of machine learning algorithms on synthetic data to classify the risk of developing preeclampsia by pregnant women. Synthetic datasets were generated based on parameter distributions from three real patient studies. Four models were compared: XGBoost, Support Vector Machine (SVM), Random Forest, and Explainable Boosting Machines (EBM). The study found that the XGBoost...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Anna Baj-Rogowska dr

    Anna Baj-Rogowska zatrudniona jest na stanowisku adiunkta w Katedrze Informatyki w Zarządzaniu (Politechnika Gdańska, Wydział Zarządzania i Ekonomii). Jej wyższa edukacja związana jest z Uniwersytetem Gdańskim, gdzie ukończyła magisterskie studia informatyczne, studia doktoranckie i następnie uzyskała stopień naukowy doktora nauk ekonomicznych w zakresie nauk o zarządzaniu (Katedra Informatyki Ekonomicznej na Wydziale Zarządzania...

  • System for automatic singing voice recognition

    W artykule przedstawiono system automatycznego rozpoznawania jakości i typu głosu śpiewaczego. Przedstawiono bazę danych oraz zaimplementowane parametry. Algorytmem decyzyjnym jest algorytm sztucznych sieci neuronowych. Wytrenowany system decyzyjny osiąga skuteczność ok. 90% w obydwu kategoriach rozpoznawania. Dodatkowo wykazano przy pomocy metod statystycznych, że wyniki działania systemu automatycznej oceny jakości technicznej...

  • COMPLEXITY OF INNOVATIVE FINANCIAL PRODUCTS: THE CASE OF SYNTHETIC EXCHANGE TRADED FUNDS IN EUROPE

    The aim of the text is the presentation of the most important categories of exchange traded funds (ETFs) – physical and synthetic ones. A theoretical part of the text includes an overview of the main features of ETFs, the presentation of differences between physical and synthetic funds and the main risks posed by both types to their users and the whole financial systems. An empirical part focuses on the European market. Time span...

    Pełny tekst do pobrania w portalu

  • SYNTHETIC COMMUNICATIONS

    Czasopisma

    ISSN: 0039-7911 , eISSN: 1532-2432

  • Voice Multilateration System

    This paper presents an innovative method of locating airplanes, which uses only voice communication between an air traffic controller and the pilot of an aircraft. The proposed method is described in detail along with its practical implementation in the form of a technology demonstrator (proof of concept), included in the voice communication system (VCS). A complete analysis of the performance of the developed method is presented,...

    Pełny tekst do pobrania w portalu

  • Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

    Publikacja

    - IEEE Access - Rok 2020

    The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

    Pełny tekst do pobrania w portalu

  • 3D seafloor reconstruction using data from side scan and synthetic aperture sonar

    Publikacja

    Side scan and synthetic aperture sonars are widely used imaging systems in the underwater environment. They are relatively cheap and easy to deploy, in comparison with more powerful sensors, like multibeam echosounders. Although side scan and synthetic aperture sonars does not provide seafloor bathymetry directly, their records are finally related to seafloor images. Moreover, the analysis of such images performed by human eye...

    Pełny tekst do pobrania w portalu

  • ENERGY CONVERSION AND MANAGEMENT

    Czasopisma

    ISSN: 0196-8904 , eISSN: 1879-2227

  • Automatic singing voice recognition employing neural networks and rough sets

    Publikacja

    Celem prac opisanych w referacie jest automatyczne rozpoznawanie głosów śpiewaczych. Do tego celu utworzona została baza nagrań próbek śpiewu profesjonalnego i amatorskiego. Próbki poddane zostały parametryzacji parametrami zaproponowanymi przez autorów ściśle do tego celu. Sposób wyznaczenia parametrów i ich interpretacja fizyczna przedstawione są w referacie. Parametry wprowadzane są do systemów decyzyjnych, klasyfikatorów opartych...

  • Synthesis and Characterization of Poly(zwitterionic) Structures for Energy Conversion and Storage

    Publikacja

    - Rok 2021

    Zwitterions are unique class of molecules that possess two functional groups bearing electric charges, one positive and second negative. This setup results in peculiar properties such as high water retention and anti-fouling capability. Therefore, zwitterionic coatings and gels are commonly applied in e.g. biosensing and bioelectronic devices. Despite those applications, there are other perspectives for zwitterionic materials....

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Speech Analytics Based on Machine Learning

    In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Ag modified ZnO microsphere synthesis for efficient sonophotocatalytic degradation of organic pollutants and CO2 conversion

    Publikacja
    • M. F. Khan
    • S. u. H. Bakhtiar
    • A. Zada
    • F. Raziq
    • H. A. Saleemi
    • M. S. Khan
    • P. M. Ismail
    • A. C. Alguno
    • R. Y. Capangpangan
    • A. Ali... i 4 innych

    - Environmental Nanotechnology, Monitoring and Management - Rok 2022

    The synthesis and design of non-precious and efficient sonophotocatalyts by an environment friendly technique are requisites for solar energy conversion and environmental remediation. This work reports the preparation of Ag/ZnO microspheres with different Ag contents through deposition–precipitation method for pollutant degradation and CO2 conversion. Detail structural investigation reveals that ZnO microspheres and Ag-ZnO microspheres...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Language Models in Speech Recognition

    Publikacja

    - Rok 2022

    This chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Voice command recognition using hybrid genetic algorithm

    Publikacja

    Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...

    Pełny tekst do pobrania w portalu

  • Agile Commerce in the light of Text Mining

    The survey conducted for this study reveals that more than 84% of respondents have never encountered the term “agile commerce” and do not understand its meaning. At the same time, they are active participants of this strategy. Using digital channels as customers more often than ever before, they have already been included in the agile philosophy. Based on the above, the purpose of the study is to analyse major text sets containing...

    Pełny tekst do pobrania w portalu

  • Ontology-based text convolution neural network (TextCNN) for prediction of construction accidents

    Publikacja
    • S. Donghui
    • L. Zhigang
    • J. Zurada
    • A. Manikas
    • J. Guan
    • P. Weichbroth

    - KNOWLEDGE AND INFORMATION SYSTEMS - Rok 2024

    The construction industry suffers from workplace accidents, including injuries and fatalities, which represent a significant economic and social burden for employers, workers, and society as a whole.The existing research on construction accidents heavily relies on expert evaluations,which often suffer from issues such as low efficiency, insufficient intelligence, and subjectivity.However, expert opinions provided in construction...

    Pełny tekst do pobrania w serwisie zewnętrznym