Wyniki wyszukiwania dla: rate of speech estimation - MOST Wiedzy

Wyszukiwarka

Wyniki wyszukiwania dla: rate of speech estimation

Wyniki wyszukiwania dla: rate of speech estimation

  • Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

    Publikacja
    • D. Korzekwa
    • R. Barra-Chicote
    • B. Kostek
    • T. Drugman
    • M. Łajszczak

    - Rok 2019

    We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

    Pełny tekst do pobrania w portalu

  • Detecting Lombard Speech Using Deep Learning Approach

    Publikacja
    • K. Kąkol
    • G. Korvel
    • G. Tamulevicius
    • B. Kostek

    - SENSORS - Rok 2023

    Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

    Pełny tekst do pobrania w portalu

  • Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

    Publikacja

    - Rok 2022

    The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

    Pełny tekst do pobrania w portalu

  • Speech Intelligibility Measurements in Auditorium

    Publikacja

    Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

    Pełny tekst do pobrania w portalu

  • Time-domain prosodic modifications for text-to-speech synthesizer

    Publikacja

    - Rok 2010

    An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

  • Language Models in Speech Recognition

    Publikacja

    - Rok 2022

    This chapter describes language models used in speech recognition, It starts by indicating the role and the place of language models in speech recognition. Mesures used to compare language models follow. An overview of n-gram, syntactic, semantic, and neural models is given. It is accompanied by a list of popular software.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

    Publikacja

    Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • A survey of automatic speech recognition deep models performance for Polish medical terms

    Among the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....

    Pełny tekst do pobrania w serwisie zewnętrznym

  • An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

    Publikacja

    - Rok 2019

    The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

    Pełny tekst do pobrania w portalu

  • Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

    Publikacja
    • K. Kąkol

    - Rok 2023

    The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

    Pełny tekst do pobrania w portalu

  • Corrupted speech intelligibility improvement using adaptive filter based algorithm

    Publikacja

    A technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.

  • Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor

    Spatial filtration of sound is introduced to enhance speech recognition accuracy in noisy conditions. An acoustic vector sensor (AVS) is employed. The signals from the AVS probe are processed in order to attenuate the surrounding noise. As a result the signal to noise ratio is increased. An experiment is featured in which speech signals are disturbed by babble noise. The signals before and after spatial filtration are processed...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Mowa nienawiści (hate speech) a odpowiedzialność dostawców usług internetowych w orzecznictwie sądów europejskich

    Publikacja

    - Rok 2015

    The article analyses the phenomenon of hate speech in the Internet contrasted with the problem of responsability of Internet Service Providers for cases of such abuses of freedom of expression. The text provides an analysis of jurisprudence of two European Courts. On the one hand it presents the position of the European Court of Human Rights on the problem of hate speech: its definition and the liability for it as an exception...

  • Automated detection of pronunciation errors in non-native English speech employing deep learning

    Publikacja

    - Rok 2023

    Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...

    Pełny tekst do pobrania w portalu

  • A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

    Publikacja
    • G. Tamulevicius
    • G. Korvel
    • A. B. Yayak
    • P. Treigys
    • J. Bernataviciene
    • B. Kostek

    - Electronics - Rok 2020

    In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...

    Pełny tekst do pobrania w portalu

  • System Supporting Speech Perception in Special Educational Needs Schoolchildren

    Publikacja

    - Rok 2012

    The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • An audio-visual corpus for multimodal automatic speech recognition

    review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

    Pełny tekst do pobrania w portalu

  • SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM

    The main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...

    Pełny tekst do pobrania w portalu

  • Virtual Keyboard controlled by eye gaze employing speech synthesis

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Virtual keyboard controlled by eye gaze employing speech synthesis

    Publikacja

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

  • Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

    Publikacja
    • D. Korzekwa
    • J. Lorenzo-trueba
    • T. Drugman
    • S. Calamaro
    • B. Kostek

    - Rok 2021

    We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

    Pełny tekst do pobrania w portalu

  • Speech Analytics Based on Machine Learning

    In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Speech synthesis controlled by eye gazing

    Publikacja

    A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

  • Tensor Decomposition for Imagined Speech Discrimination in EEG

    Publikacja

    - LECTURE NOTES IN COMPUTER SCIENCE - Rok 2018

    Most of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Distortion of speech signals in the listening area: its mechanism and measurements

    Publikacja

    - Rok 2014

    The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Ranking Speech Features for Their Usage in Singing Emotion Classification

    Publikacja

    This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

    Pełny tekst do pobrania w portalu

  • Multimodal English corpus for automatic speech recognition

    A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...

  • PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS

    Publikacja

    The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...

  • Instantaneous complex frequency for pipeline pitch estimation

    Publikacja
    • M. [. Kaniewska

    - Rok 2010

    In the paper a pipeline algorithm for estimating the pitch of speech signal is proposed. The algorithm uses instantaneous complex frequencies estimated for four waveforms obtained by filtering the original speech signal through four bandpass complex Hilbert filters. The imaginary parts of ICFs from each channel give four candidates for pitch estimates. The decision regarding the final estimate is made based on the real parts of...

  • Metoda i algorytmy modyfikacji sygnału do celu wspomagania rozumienia mowy przez osoby z pogorszoną rozdzielczością czasową słuchu

    Publikacja

    - Rok 2013

    Przedmiotem badań przeprowadzonych w ramach rozprawy są metody modyfikacji czasu trwania sygnału (ang. Time Scale Modification –TSM) mowy operujące w czasie rzeczywistym oraz ocena ich wpływu na rozumienie wypowiedzi przez osoby z pogorszoną rozdzielczością czasową słuchu. Pogorszona rozdzielczość słuchu jest jednym z symptomów związanych z ośrodkowymi zaburzeniami słuchu (ang. Cetnral Auditory Processing Disorder – CAPD). W odróżnieniu...

  • Hybrid of Neural Networks and Hidden Markov Models as a modern approach to speech recognition systems

    The aim of this paper is to present a hybrid algorithm that combines the advantages ofartificial neural networks and hidden Markov models in speech recognition for control purpos-es. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov mod-els (HMM). The main part of the paper consists of a description...

    Pełny tekst do pobrania w portalu

  • Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

    Publikacja

    - Rok 2013

    The bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...

  • Elimination of clicks from archive speech signals using sparse autoregressive modeling

    Publikacja

    This paper presents a new approach to elimination of impulsivedisturbances from archive speech signals. The proposedsparse autoregressive (SAR) signal representation is given ina factorized form - the model is a cascade of the so-called formantfilter and pitch filter. Such a technique has been widelyused in code-excited linear prediction (CELP) systems, as itguarantees model stability. After detection of noise pulses usinglinear...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Silence/noise detection for speech and music signals

    Publikacja

    - Rok 2008

    This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

  • Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

    A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition

    Publikacja

    - Biomedical Signal Processing and Control - Rok 2023

    Brain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Building Knowledge for the Purpose of Lip Speech Identification

    Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Database of speech and facial expressions recorded with optimized face motion capture settings

    The broad objective of the present research is the analysis of spoken English employing a multiplicity of modalities. An important stage of this process, discussed in the paper, is creating a database of speech accompanied with facial expressions. Recordings of speakers were made using an advanced system for capturing facial muscle motion. A brief historical outline, current applications, limitations and the ways of capturing face...

    Pełny tekst do pobrania w portalu

  • Study on Speech Transmission under Varying QoS Parameters in a OFDM Communication System

    Publikacja

    - Rok 2021

    Although there has been an outbreak of multiple multimedia platforms worldwide, speech communication is still the most essential and important type of service. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission, based most often on multi-valued modulations, multiple...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System

    Publikacja
    • P. Falkowski-Gilski
    • G. Debita
    • M. Habrych
    • B. Miedziński
    • P. Jedlikowski
    • B. Polnik
    • J. Wandzio
    • X. Wang

    - Rok 2020

    The broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Objectivization of phonological evaluation of speech elements by means of audio parametrization

    This study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...

  • Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech

    Publikacja
    • D. Piotrowski
    • R. Korzeniowski
    • A. Falai
    • S. Cygert
    • K. Pokora
    • G. Tinchev
    • Z. Zhang
    • K. Yanagisawa

    - Rok 2023

    In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets

    Artificial Neural Network (ANN) models, specifically Convolutional Neural Networks (CNN), were applied to extract emotions based on spectrograms and mel-spectrograms. This study uses spectrograms and mel-spectrograms to investigate which feature extraction method better represents emotions and how big the differences in efficiency are in this context. The conducted studies demonstrated that mel-spectrograms are a better-suited...

    Pełny tekst do pobrania w portalu

  • Introduction to the special issue on machine learning in acoustics

    Publikacja
    • Z. Michalopoulou
    • P. Gerstoft
    • B. Kostek
    • M. A. Roch

    - Journal of the Acoustical Society of America - Rok 2021

    When we started our Call for Papers for a Special Issue on “Machine Learning in Acoustics” in the Journal of the Acoustical Society of America, our ambition was to invite papers in which machine learning was applied to all acoustics areas. They were listed, but not limited to, as follows: • Music and synthesis analysis • Music sentiment analysis • Music perception • Intelligent music recognition • Musical source separation • Singing...

    Pełny tekst do pobrania w portalu

  • Communication Platform for Evaluation of Transmitted Speech Quality

    A voice communication system designed and implemented is described. The purpose of the presented platform was to enable a series of experiments related to the quality assessment of algorithms used in the coding and transmitting of speech. The system is equipped with tools for recording signals at each stage of processing, making it possible to subject them to subjective assessments by listening tests or, objective evaluation employing...

    Pełny tekst do pobrania w portalu

  • Human-computer interactions in speech therapy using a blowing interface

    Publikacja

    In this paper we present a new human-computer interface for the quantitative measurement of blowing activities. The interface can measure the air flow and air pressure during the blowing activity. The measured values are stored and used to control the state of the graphical objects in the graphical user interface. In speech therapy children will find easier to play attractive therapeutic games than to perform repetitive and tedious,...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Wzrost gospodarczy a zapotrzebowanie na prace - teoria i rzeczywistość gospodarcza Polski w latach 1995-2014

    W części teoretycznej artykułu w pierwszej kolejności przedstawiono makroekonomiczne podstawy zapotrzebowania na pracę w warunkach postępu technicznego oraz zmian kapitału rzeczowego. W następnej kolejności sformułowano założenia dla modelu opisującego zależności pomiędzy stopami wzrostu produktu krajowego i zatrudnienia. W części empirycznej artykułu rozważano wybrane wersje oszacowanego modelu opisujące gospodarkę Polski. Do...

    Pełny tekst do pobrania w portalu

  • Visual Lip Contour Detection for the Purpose of Speech Recognition

    Publikacja

    A method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...

  • Investigating Feature Spaces for Isolated Word Recognition

    Publikacja

    - Rok 2018

    Much attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...

  • Marking the Allophones Boundaries Based on the DTW Algorithm

    Publikacja

    - Rok 2018

    The paper presents an approach to marking the boundaries of allophones in the speech signal based on the Dynamic Time Warping (DTW) algorithm. Setting and marking of allophones boundaries in continuous speech is a difficult issue due to the mutual influence of adjacent phonemes on each other. It is this neighborhood on the one hand that creates variants of phonemes that is allophones, and on the other hand it affects that the border...