Wyniki wyszukiwania dla: ALLOPHONIC SPEECH TRANSCRIPTION - MOST Wiedzy

Wyszukiwarka

Wyniki wyszukiwania dla: ALLOPHONIC SPEECH TRANSCRIPTION

Wyniki wyszukiwania dla: ALLOPHONIC SPEECH TRANSCRIPTION

  • Methodology and technology for the polymodal allophonic speech transcription

    A method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Methodology and technology for the polymodal allophonic speech transcription

    A method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Material for Automatic Phonetic Transcription of Speech Recorded in Various Conditions

    Publikacja

    Automatic speech recognition (ASR) is under constant development, especially in cases when speech is casually produced or it is acquired in various environment conditions, or in the presence of background noise. Phonetic transcription is an important step in the process of full speech recognition and is discussed in the presented work as the main focus in this process. ASR is widely implemented in mobile devices technology, but...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Analysis of allophones based on audio signal recordings and parameterization

    The aim of this study is to develop an allophonic description of English plosive consonants based on recordings of 600 specially selected words. Allophonic variations addressed in the study may have two sources: positional and contextual. The former one depends on the syllabic or prosodic position in which a particular phoneme occurs. Contextual allophony is conditioned by the local phonetic environment. Co-articulation overlapping...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

    A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Building Knowledge for the Purpose of Lip Speech Identification

    Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES

    Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’...

    Pełny tekst do pobrania w portalu

  • Marking the Allophones Boundaries Based on the DTW Algorithm

    Publikacja

    - Rok 2018

    The paper presents an approach to marking the boundaries of allophones in the speech signal based on the Dynamic Time Warping (DTW) algorithm. Setting and marking of allophones boundaries in continuous speech is a difficult issue due to the mutual influence of adjacent phonemes on each other. It is this neighborhood on the one hand that creates variants of phonemes that is allophones, and on the other hand it affects that the border...

  • Determining Pronunciation Differences in English Allophones Utilizing Audio Signal Parameterization

    Publikacja

    - Rok 2017

    An allophonic description of English plosive consonants, based on audio-visual recordings of 600 specially selected words, was developed. First, several speakers were recorded while reading words from a teleprompter. Then, every word was played back from the previously recorded sample read by a phonology expert and each examined speaker repeated a particular word trying to imitate correct pronunciation. The next step consisted...

  • Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

    Publikacja

    - IEEE Access - Rok 2020

    The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

    Pełny tekst do pobrania w portalu

  • Transcription-Austin

    Czasopisma

    ISSN: 2154-1264 , eISSN: 2154-1272

  • Speech Intelligibility Measurements in Auditorium

    Publikacja

    Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Transient detection for speech coding applications

    Signal quality in speech codecs may be improved by selecting transients from speech signal and encoding them using a suitable method. This paper presents an algorithm for transient detection in speech signal. This algorithm operates in several frequency bands. Transient detection functions are calculated from energy measured in short frames of the signal. The final selection of transient frames is based on results of detection...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Improving the quality of speech in the conditions of noise and interference

    Publikacja

    The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Constructing a Dataset of Speech Recordingswith Lombard Effect

    Publikacja

    - Rok 2020

    Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

  • Improved method for real-time speech stretching

    Publikacja

    n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Real-time speech-rate modification experiments

    Publikacja

    An algorithm designed for real-time speech time scale modification (stretching) is proposed, providing a combination of typical synchronous overlap and add based time scale modification algorithm and signal redundancy detection algorithms that allow to remove parts of the speech signal and replace them with the stretched speech signal fragments. Effectiveness of signal processing algorithms are examined experimentally together...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Improving Objective Speech Quality Indicators in Noise Conditions

    Publikacja

    - Rok 2020

    This work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Time-domain prosodic modifications for text-to-speech synthesizer

    Publikacja

    - Rok 2010

    An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

  • Speech Analytics Based on Machine Learning

    In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Speech synthesis controlled by eye gazing

    Publikacja

    A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

  • A Method of Real-Time Non-uniform Speech Stretching

    Publikacja

    Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

    Publikacja
    • D. Korzekwa
    • R. Barra-Chicote
    • B. Kostek
    • T. Drugman
    • M. Łajszczak

    - Rok 2019

    We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

    Pełny tekst do pobrania w portalu

  • Examining Influence of Distance to Microphone on Accuracy of Speech Recognition

    The problem of controlling a machine by the distant-talking speaker without a necessity of handheld or body-worn equipment usage is considered. A laboratory setup is introduced for examination of performance of the developed automatic speech recognition system fed by direct and by distant speech acquired by microphones placed at three different distances from the speaker (0.5 m to 1.5 m). For feature extraction from the voice signal...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Comparison of various speech time-scale modificartion methods

    The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

  • Speech codec enhancements utilizing time compression and perceptual coding

    Publikacja

    A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

  • Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

    Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Multimodal English corpus for automatic speech recognition

    A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...

  • An audio-visual corpus for multimodal automatic speech recognition

    review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

    Pełny tekst do pobrania w portalu

  • Comparison of Acoustic and Visual Voice Activity Detection for Noisy Speech Recognition

    Publikacja

    The problem of accurate differentiating between the speaker utterance and the noise parts in a speech signal is considered. The influence of utilizing a voice activity detection in speech signals on the accuracy of the automatic speech recognition (ASR) system is presented. The examined methods of voice activity detection are based on acoustic and visual modalities. The problem of detecting the voice activity in clean and noisy...

  • Ranking Speech Features for Their Usage in Singing Emotion Classification

    Publikacja

    This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

    Pełny tekst do pobrania w portalu

  • System Supporting Speech Perception in Special Educational Needs Schoolchildren

    Publikacja

    - Rok 2012

    The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • High quality speech codec employing sines+noise+transients model

    A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

  • Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

    Publikacja

    - Rok 2018

    The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

  • Virtual keyboard controlled by eye gaze employing speech synthesis

    Publikacja

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

  • Virtual Keyboard controlled by eye gaze employing speech synthesis

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Silence/noise detection for speech and music signals

    Publikacja

    - Rok 2008

    This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

  • An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

    Publikacja

    - Rok 2019

    The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

    Pełny tekst do pobrania w portalu

  • Corrupted speech intelligibility improvement using adaptive filter based algorithm

    Publikacja

    A technique for improving the quality of speech signals recorded in strong noise is presented. The proposed algorithmemploying adaptive filtration is described and additional possibilities of speech intelligibility improvement arediscussed. Results of the tests are presented.

  • Distortion of speech signals in the listening area: its mechanism and measurements

    Publikacja

    - Rok 2014

    The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • A non-uniform real-time speech time-scale stretching method

    Publikacja

    An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were...

  • A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

    Publikacja
    • G. Tamulevicius
    • G. Korvel
    • A. B. Yayak
    • P. Treigys
    • J. Bernataviciene
    • B. Kostek

    - Electronics - Rok 2020

    In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...

    Pełny tekst do pobrania w portalu

  • Communication Platform for Evaluation of Transmitted Speech Quality

    A voice communication system designed and implemented is described. The purpose of the presented platform was to enable a series of experiments related to the quality assessment of algorithms used in the coding and transmitting of speech. The system is equipped with tools for recording signals at each stage of processing, making it possible to subject them to subjective assessments by listening tests or, objective evaluation employing...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

    In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

  • Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

    Publikacja

    - Rok 2007

    In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

  • Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

    Publikacja
    • D. Korzekwa
    • J. Lorenzo-Trueba
    • T. Drugman
    • S. Calamaro
    • B. Kostek

    - Rok 2021

    We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

    Pełny tekst do pobrania w portalu

  • Mowa nienawiści (hate speech) a odpowiedzialność dostawców usług internetowych w orzecznictwie sądów europejskich

    Publikacja

    - Rok 2015

    The article analyses the phenomenon of hate speech in the Internet contrasted with the problem of responsability of Internet Service Providers for cases of such abuses of freedom of expression. The text provides an analysis of jurisprudence of two European Courts. On the one hand it presents the position of the European Court of Human Rights on the problem of hate speech: its definition and the liability for it as an exception...

  • Visual Lip Contour Detection for the Purpose of Speech Recognition

    Publikacja

    A method for visual detection of lip contours in frontal recordings of speakers is described and evaluated. The purpose of the method is to facilitate speech recognition with visual features extracted from a mouth region. Different Active Appearance Models are employed for finding lips in video frames and for lip shape and texture statistical description. Search initialization procedure is proposed and error measure values are...

  • Objectivization of phonological evaluation of speech elements by means of audio parametrization

    This study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...

  • Elimination of clicks from archive speech signals using sparse autoregressive modeling

    Publikacja

    This paper presents a new approach to elimination of impulsivedisturbances from archive speech signals. The proposedsparse autoregressive (SAR) signal representation is given ina factorized form - the model is a cascade of the so-called formantfilter and pitch filter. Such a technique has been widelyused in code-excited linear prediction (CELP) systems, as itguarantees model stability. After detection of noise pulses usinglinear...

    Pełny tekst do pobrania w serwisie zewnętrznym