Wyniki wyszukiwania dla: rate of speech estimation - MOST Wiedzy

Wyszukiwarka

Wyniki wyszukiwania dla: rate of speech estimation

Wyniki wyszukiwania dla: rate of speech estimation

  • POPRAWA OBIEKTYWNYCH WSKAŹNIKÓW JAKOŚCI MOWY W WARUNKACH HAŁASU

    Celem pracy jest modyfikacja sygnału mowy, aby uzyskać zwiększenie poprawy obiektywnych wskaźników jakości mowy po zmiksowaniu sygnału użytecznego z szumem bądź z sygnałem zakłócającym. Wykonane modyfikacje sygnału bazują na cechach mowy lombardzkiej, a w szczególności na efekcie podniesienia częstotliwości podstawowej F0. Sesja nagraniowa obejmowała zestawy słów i zdań w języku polskim, nagrane w warunkach ciszy, jak również w...

    Pełny tekst do pobrania w portalu

  • Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

    Publikacja

    - Rok 2021

    This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

    Pełny tekst do pobrania w portalu

  • Zastosowanie spowalniania wypowiedzi w celu poprawy rozumienia mowy przez dzieci w szkole

    Publikacja

    This paper presents a time-scale modification algorithms that could be used for hearing impairment therapy supported by real-time speech stretching. In this paper the OLA based algorithms and Phase Vocoder were described. In the experimental part usability of those algorithms for real-time speech stretching was discussed

  • Difference in Perceived Speech Signal Quality Assessment Among Monolingual and Bilingual Teenage Students

    Publikacja

    - Rok 2021

    The user perceived quality is a mixture of factors, including the background of an individual. The process of auditory perception is discussed in a wide variety of fields, ranging from engineering to medicine. Many studies examine the difference between musicians and non-musicians. Since musical training develops musical hearing and other various auditory capabilities, similar enhancements should be observable in case of bilingual...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • XVIII Międzynarodowe Sympozjum Inżynierii i Reżyserii Dźwięku

    Publikacja

    - Rok 2021

    The subjective assessment of speech signals takes into account previous experiences and habits of an individual. Since the perception process deteriorates with age, differences should be noticeable among people from dissimilar age groups. In this work, we investigated the difference of speech quality assessment between high school students and university students. The study involved 60 participants, with 30 people in both the adolescents...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Noise profiling for speech enhancement employing machine learning models

    Publikacja

    - Journal of the Acoustical Society of America - Rok 2022

    This paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...

    Pełny tekst do pobrania w portalu

  • Creating new voices using normalizing flows

    Publikacja
    • P. Biliński
    • T. Merritt
    • A. Ezzerg
    • K. Pokora
    • S. Cygert
    • K. Yanagisawa
    • R. Barra-Chicote
    • D. Korzekwa

    - Rok 2022

    Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS...

    Pełny tekst do pobrania w portalu

  • Transfer learning in imagined speech EEG-based BCIs

    Publikacja

    - Biomedical Signal Processing and Control - Rok 2019

    The Brain–Computer Interfaces (BCI) based on electroencephalograms (EEG) are systems which aim is to provide a communication channel to any person with a computer, initially it was proposed to aid people with disabilities, but actually wider applications have been proposed. These devices allow to send messages or to control devices using the brain signals. There are different neuro-paradigms which evoke brain signals of interest...

    Pełny tekst do pobrania w portalu

  • Badanie i analiza efektywności alokacji strumieni danych w heterogenicznej sieci WBAN

    Publikacja

    - Rok 2017

    W niniejszej dysertacji doktorskiej poddano dyskusji efektywność alokacji strumieni danych w heterogenicznej radiowej sieci WBAN (Wireless Body Area Networks). Biorąc pod uwagę dynamiczny rozwój nowoczesnych sieci radiokomunikacyjnych piątej generacji (5G), którego część stanowią radiowe sieci działające w obrębie ciała człowieka, bardzo ważnym aspektem są metody maksymalizujące wykorzystanie dostępnych zasobów czasowo –częstotliwościowych...

    Pełny tekst do pobrania w portalu

  • Human voice modification using instantaneous complex frequency

    Publikacja
    • M. Kaniewska

    - Rok 2010

    The paper presents the possibilities of changing human voice by modifying instantaneous complex frequency (ICF) of the speech signal. The proposed method provides a flexible way of altering voice without the necessity of finding fundamental frequency and formants' positions or detecting voiced and unvoiced fragments of speech. The algorithm is simple and fast. Apart from ICF it uses signal factorization into two factors: one fully...

  • Investigating Feature Spaces for Isolated Word Recognition

    Publikacja
    • P. Treigys
    • G. Korvel
    • G. Tamulevicius
    • J. Bernataviciene
    • B. Kostek

    - Rok 2020

    The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Auditory-visual attention stimulator

    New approach to lateralization irregularities formation was proposed. The emphasis is put on the relationship between visual and auditory attention stimulation. In this approach hearing is stimulated using time scale modified speech and sight is stimulated by rendering the text of the currently heard speech. Moreover, displayed text is modified using several techniques i.e. zooming, highlighting etc. In the experimental part of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • INVESTIGATION OF THE LOMBARD EFFECT BASED ON A MACHINE LEARNING APPROACH

    Publikacja

    The Lombard effect is an involuntary increase in the speaker’s pitch, intensity, and duration in the presence of noise. It makes it possible to communicate in noisy environments more effectively. This study aims to investigate an efficient method for detecting the Lombard effect in uttered speech. The influence of interfering noise, room type, and the gender of the person on the detection process is examined. First, acoustic parameters...

    Pełny tekst do pobrania w portalu

  • Audio-visual aspect of the Lombard effect and comparison with recordings depicting emotional states.

    In this paper an analysis of audio-visual recordings of the Lombard effect is shown. First, audio signal is analyzed indicating the presence of this phenomenon in the recorded sessions. The principal aim, however, was to discuss problems related to extracting differences caused by the Lombard effect, present in the video , i.e. visible as tension and work of facial muscles aligned to an increase in the intensity of the articulated...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Performance Analysis of the OpenCL Environment on Mobile Platforms

    Publikacja

    Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Prof. Haitham Abu-Rub - A Visit to Poland's Gdansk University of Technology

    Report on visit of Prof. Haitham Abu-Rub in Gdansk University of Technology. Speech on the Smart Grid Centre. Visit in the new smart grid laboratory of the GUT, the Laboratory for Innovative Power Technologies and Integration of Renewable Energy Sources (LINTE^2).

    Pełny tekst do pobrania w portalu

  • Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling

    Publikacja

    - Rok 2021

    A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • A Comparison of STI Measured by Direct and Indirect Methods for Interiors Coupled with Sound Reinforcement Systems

    Publikacja

    This paper presents a comparison of STI (Speech Transmission Index) coefficient measurement results carried out by direct and indirect methods. First, acoustic parameters important in the context of public address and sound reinforcement systems are recalled. A measurement methodology is presented that employs various test signals to determine impulse responses. The process of evaluating sound system performance, signals enabling...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Wpływ liczby mieszkańców na obliczenia ilości ścieków w projektowaniu kanalizacji sanitarnej

    Publikacja

    Zmiana sposobu rozliczania zużycia wody, promocja wodooszczędnych technologii oraz uszczelnienie kanałów wyraźnie wpłynęły na ilość odprowadzanych do kanalizacji ścieków. Obecnie stwierdzić można, że opracowania z lat 70-tych XX wieku podają zawyżone wskaźniki odpływu ścieków, skutkiem czego są problemy eksploatacyjne. Bez wątpienia, prawidłowe zaprojektowanie kanalizacji sanitarnej to zadanie trudne, wymagające między innymi prawidłowego...

  • Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set

    Publikacja

    - Applied Sciences-Basel - Rok 2023

    This work deals with automatically detecting stuttering and its subclasses. An effective classification of stuttering along with its subclasses could find wide application in determining the severity of stuttering by speech therapists, preliminary patient diagnosis, and enabling communication with the previously mentioned voice assistants. The first part of this work provides an overview of examples of classical and deep learning...

    Pełny tekst do pobrania w portalu

  • A comparative study of English viseme recognition methods and algorithms

    An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...

    Pełny tekst do pobrania w portalu

  • Modeling and Designing Acoustical Conditions of the Interior – Case Study

    The primary aim of this research study was to model acoustic conditions of the Courtyard of the Gdańsk University of Technology Main Building, and then to design a sound reinforcement system for this interior. First, results of measurements of the parameters of the acoustic field are presented. Then, the comparison between measured and predicted values using the ODEON program is shown. Collected data indicate a long reverberation...

    Pełny tekst do pobrania w portalu

  • Comparative analysis of various transformation techniques for voiceless consonants modeling

    Publikacja

    In this paper, a comparison of various transformation techniques, namely Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Discrete Walsh Hadamard Transform (DWHT) are performed in the context of their application to voiceless consonant modeling. Speech features based on these transformation techniques are extracted. These features are mean and derivative values of cepstrum coefficients, derived from each transformation....

    Pełny tekst do pobrania w portalu

  • A comparative study of English viseme recognition methods and algorithm

    An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...

    Pełny tekst do pobrania w portalu

  • Playback detection using machine learning with spectrogram features approach

    Publikacja

    This paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on different data partitions in respect...

    Pełny tekst do pobrania w portalu

  • Evaluation Criteria for Affect-Annotated Databases

    In this paper a set of comprehensive evaluation criteria for affect-annotated databases is proposed. These criteria can be used for evaluation of the quality of a database on the stage of its creation as well as for evaluation and comparison of existing databases. The usefulness of these criteria is demonstrated on several databases selected from affect computing domain. The databases contain different kind of data: video or still...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Intelligent video and audio applications for learning enhancement

    The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Intelligent multimedia solutions supporting special education needs.

    The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

  • New approach for determining the QoS of MP3-coded voice signals in IP networks

    Publikacja

    Present-day IP transport platforms being what they are, it will never be possible to rule out conflicts between the available services. The logical consequence of this assertion is the inevitable conclusion that the quality of service (QoS) must always be quantifiable no matter what. This paper focuses on one method to determine QoS. It defines an innovative, simple model that can evaluate the QoS of MP3-coded voice data transported...

    Pełny tekst do pobrania w portalu

  • Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

    Publikacja

    In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Analysis-by-synthesis paradigm evolved into a new concept

    This work aims at showing how the well-known analysis-by-synthesis paradigm has recently been evolved into a new concept. However, in contrast to the original idea stating that the created sound should not fail to pass the foolproof synthesis test, the recent development is a consequence of the need to create new data. Deep learning models are greedy algorithms requiring a vast amount of data that, in addition, should be correctly...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • DEVELOPMENT OF THE ALGORITHM OF POLISH LANGUAGE FILM REVIEWS PREPROCESSING

    The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in the Vector Form (Vector Space Model) Procedure; Forming...

    Pełny tekst do pobrania w portalu

  • Selection of Features for Multimodal Vocalic Segments Classification

    Publikacja

    English speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the influence of feature vector dimension reduction for the accuracy of vocalic segments classification employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • A study on signal processing methods applied to hearing aids

    Publikacja

    - Rok 2016

    This paper presents a short survey on current technology available in hearing aids with a focus on digital signal processing techniques used. First, factors influencing the hearing aid effectiveness are introduced. Then, examples of the present DSP methods and strategies are provided. Also, a description of current limitations of hearing aids and future trends of development are shown. Finally, the notion of computational auditory...

  • Results of tests on speech intelligibility in reverberant conditions

    Dane Badawcze

    The dataset contains the results of tests that aimed to provide a relationship between the rate of speech (RoS) and reverberation conditions characterized by the Speech Transmission Index (STI).

  • MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES

    Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’...

    Pełny tekst do pobrania w portalu

  • Vocalic Segments Classification Assisted by Mouth Motion Capture

    Visual features convey important information for automatic speech recognition (ASR), especially in noisy environment. The purpose of this study is to evaluate to what extent visual data (i.e. lip reading) can enhance recognition accuracy in the multi-modal approach. For that purpose motion capture markers were placed on speakers' faces to obtain lips tracking data during speaking. Different parameterizations strategies were tested...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • A Device for Measuring Auditory Brainstem Responses to Audio

    Standard ABR devices use clicks and tone bursts to assess subjects’ hearing in an objective way. A new device was developed that extends the functionality of a standard ABR audiometer by collecting and analyzing auditory brainstem responses (ABR). The developed accessory allows for the use of complex sounds (e.g., speech or music excerpts) as stimuli. Therefore, it is possible to find out how efficiently different types of sounds...

    Pełny tekst do pobrania w portalu

  • Examining Feature Vector for Phoneme Recognition / Analiza parametrów w kontekście automatycznej klasyfikacji fonemów

    Publikacja

    - Rok 2017

    The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

  • Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training

    In the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...

    Pełny tekst do pobrania w portalu

  • Secured wired BPL voice transmission system

    Publikacja

    - Scientific Journal of the Military University of Land Forces - Rok 2020

    Designing a secured voice transmission system is not a trivial task. Wired media, thanks to their reliability and resistance to mechanical damage, seem an ideal solution. The BPL (Broadband over Power Line) cable is resistant to electricity stoppage and partial damage of phase conductors, ensuring continuity of transmission in case of an emergency. It seems an appropriate tool for delivering critical data, mostly clear and understandable...

    Pełny tekst do pobrania w portalu

  • Comparison of Lithuanian and Polish Consonant Phonemes Based on Acoustic Analysis – Preliminary Results

    Publikacja

    - Archives of Acoustics - Rok 2019

    The goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are...

    Pełny tekst do pobrania w portalu

  • Multimedia industrial and medical applications supported by machine learning

    Publikacja

    - Rok 2023

    This article outlines a keynote paper presented at the Intelligent DecisionTechnologies conference providing a part of the KES Multi-theme Conference “Smart Digital Futures” organized in Rome on June 14–16, 2023. It briefly discusses projects related to traffic control using developed intelligent traffic signs and diagnosing the health of wind turbine mechanisms and multimodal biometric authentication for banking branches to provide...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Voice command recognition using hybrid genetic algorithm

    Publikacja

    Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...

    Pełny tekst do pobrania w portalu

  • The Innovative Faculty for Innovative Technologies

    A leaflet describing Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology. Multimedia Systems Department described laboratories and prototypes of: Auditory-visual attention stimulator, Automatic video event detection, Object re-identification application for multi-camera surveillance systems, Object Tracking and Automatic Master-Slave PTZ Camera Positioning System, Passive Acoustic Radar,...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Evaluation of Respiration Rate Using Thermal Imaging in Mobile Conditions

    Publikacja

    Respiratory rate is very important vital sign that should be measured and documented in many medical situations. The remote measurement of respiration rate can be especially valuable for medical screening purposes (e.g. severe acute respiratory syndrome (SARS), pandemic influenza, etc.). In this chapter we present a review of many different studies focused on the measurements and estimation of respiration rate using thermal imaging...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Determining Pronunciation Differences in English Allophones Utilizing Audio Signal Parameterization

    Publikacja

    - Rok 2017

    An allophonic description of English plosive consonants, based on audio-visual recordings of 600 specially selected words, was developed. First, several speakers were recorded while reading words from a teleprompter. Then, every word was played back from the previously recorded sample read by a phonology expert and each examined speaker repeated a particular word trying to imitate correct pronunciation. The next step consisted...

  • Ultrawideband transmission in physical channels: a broadband interference view

    The superposition of multipath components (MPC) of an emitted wave, formed by reflections from limiting surfaces and obstacles in the propagation area, strongly affects communication signals. In the case of modern wideband systems, the effect should be seen as a broadband counterpart of classical interference which is the cause of fading in narrowband systems. This paper shows that in wideband communications, the time- and frequency-domain...

    Pełny tekst do pobrania w portalu

  • Examining Feature Vector for Phoneme Recognition

    Publikacja

    - Rok 2018

    The aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...

  • Emotions in polish speech recordings

    Dane Badawcze
    open access

    The data set presents emotions recorded in sound files that are expressions of Polish speech. Statements were made by people aged 21-23, young voices of 5 men. Each person said the following words / nie – no, oddaj - give back, podaj – pass, stop - stop, tak - yes, trzymaj -hold / five times representing a specific emotion - one of three - anger (a),...