Wyniki wyszukiwania dla: ALOPHONEME ANALISYS, SPEECH PROCESSING, DYNAMIC TIME WARPING - MOST Wiedzy

Wyszukiwarka

Wyniki wyszukiwania dla: ALOPHONEME ANALISYS, SPEECH PROCESSING, DYNAMIC TIME WARPING

Filtry

wszystkich: 812
wybranych: 725

wyczyść wszystkie filtry


Filtry wybranego katalogu

  • Kategoria

  • Rok

  • Opcje

wyczyść Filtry wybranego katalogu niedostępne

Wyniki wyszukiwania dla: ALOPHONEME ANALISYS, SPEECH PROCESSING, DYNAMIC TIME WARPING

  • System Supporting Speech Perception in Special Educational Needs Schoolchildren

    Publikacja

    - Rok 2012

    The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • A handwritten signature verification method employing a tablet

    Publikacja

    - Rok 2016

    A signature verification system based on static features and time-domain functions of signals obtained using a tablet has been presented in the paper. The signature verification method, based mainly on dynamic time warping coupled with some signature image features, has been described. The FRR measures reflecting the method's efficiency have been evaluated for verification attempts performed directly after obtaining model signatures...

  • Investigating Feature Spaces for Isolated Word Recognition

    Publikacja

    - Rok 2018

    Much attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...

  • Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

    Publikacja

    The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

    Pełny tekst do pobrania w portalu

  • Handwritten signature verification system employing wireless biometric pen

    Publikacja

    - Rok 2017

    The handwritten signature verification system being a part of the developed multimodal biometric banking stand is presented. The hardware component of the solution is described with a focus on the signature acquisition and on verification procedures. The signature is acquired employing an accelerometer and a gyroscope built-in the biometric pen plus pressure sensors for the assessment of the proper pen grip and then the signature...

  • Introduction to the special issue on machine learning in acoustics

    Publikacja
    • Z. Michalopoulou
    • P. Gerstoft
    • B. Kostek
    • M. A. Roch

    - Journal of the Acoustical Society of America - Rok 2021

    When we started our Call for Papers for a Special Issue on “Machine Learning in Acoustics” in the Journal of the Acoustical Society of America, our ambition was to invite papers in which machine learning was applied to all acoustics areas. They were listed, but not limited to, as follows: • Music and synthesis analysis • Music sentiment analysis • Music perception • Intelligent music recognition • Musical source separation • Singing...

    Pełny tekst do pobrania w portalu

  • Multimodal English corpus for automatic speech recognition

    A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...

  • Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

    Publikacja
    • K. Kąkol

    - Rok 2023

    The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

    Pełny tekst do pobrania w portalu

  • Low-Level Music Feature Vectors Embedded as Watermarks

    In this paper a method consisting in embedding low-level music feature vectors as watermarks into a musical signal is proposed. First, a review of some recent watermarking techniques and the main goals of development of digital watermarking research are provided. Then, a short overview of parameterization employed in the area of Music Information Retrieval is given. A methodology of non-blind watermarking applied to music-content...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • An audio-visual corpus for multimodal automatic speech recognition

    review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

    Pełny tekst do pobrania w portalu

  • Zastosowanie spowalniania wypowiedzi w celu poprawy rozumienia mowy przez dzieci w szkole

    Publikacja

    This paper presents a time-scale modification algorithms that could be used for hearing impairment therapy supported by real-time speech stretching. In this paper the OLA based algorithms and Phase Vocoder were described. In the experimental part usability of those algorithms for real-time speech stretching was discussed

  • Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically

    Publikacja

    - Rok 2022

    The aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...

    Pełny tekst do pobrania w portalu

  • Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

    Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

    Pełny tekst do pobrania w portalu

  • WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE

    Publikacja

    W niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...

  • Detecting Lombard Speech Using Deep Learning Approach

    Publikacja
    • K. Kąkol
    • G. Korvel
    • G. Tamulevicius
    • B. Kostek

    - SENSORS - Rok 2023

    Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

    Pełny tekst do pobrania w portalu

  • Strategie treningu neuronowego estymatora częstotliwości tonu krtaniowego z użyciem generatora syntetycznych samogłosek

    W wielu zastosowaniach telekomunikacyjnych pojawia się problem przetwarzania lub analizy sygnału mowy, w ramach którego, często w obszarze podstawowych algorytmów, stosuje się estymator częstotliwości tonu krtaniowego. Estymator rozpatrywany w tej pracy bazuje na neuronowym klasyfikatorze podejmującym decyzje na podstawie częstotliwości oraz mocy chwilowej wyznaczanych w podpasmach analizowanego sygnału mowy. W pracy rozważamy...

    Pełny tekst do pobrania w portalu

  • Thin-walled frames and grids - statics and dynamics

    Publikacja

    Frames and grids assembled with thin-walled beams of open cross-section are widely applied in various civil engineering and vehicle or machine structures. Static and dynamic analysis of theses structures may be carried out by means of different models, startingfrom the classical models made of beam elements undergoing the Kirchhoff assumptions to the FE discretization of whole frame into plane elements. The former model is very...

  • Uniwersalny system RPG do zastosowań w przestrzeniach inteligentnych

    Publikacja

    - Rok 2009

    Artykuł dotyczy systemów rozpoznawania poleceń głosowych(RPG). Przedstawiono dwa podstawowe rodzaje systemów RPG i przeprowadzono dyskusję nad wyborem architektury odpowiedniej do zastosowań w przestrzeniach inteligentnych (PI). Zaprezentowano algorytm czasowego dopasowania sygnałów (ang. Dinamic Time Warping - DTW) oraz budowę elementu decyzyjnego zaimplementowanego systemu. Przedstawiono wyniki oceny tego systemu.

  • Variable Ratio Sample Rate Conversion Based on Fractional Delay Filter

    Publikacja

    - Archives of Acoustics - Rok 2014

    In this paper a sample rate conversion algorithm which allows for continuously changing resampling ratio has been presented. The proposed implementation is based on a variable fractional delay filter which is implemented by means of a Farrow structure. Coefficients of this structure are computed on the basis of fractional delay filters which are designed using the offset window method. The proposed approach allows us to freely...

    Pełny tekst do pobrania w portalu

  • A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times

    Publikacja

    Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...

    Pełny tekst do pobrania w portalu

  • A Novel Approach to the Assessment of Cough Incidence

    Publikacja

    In this paper we consider the problem of identication of cough events in patients suffering from chronic respiratory diseases. The information about frequency of cough events is necessary to medical treatment. The proposed approach is based on bidirectional processing of a measured vibration signal - cough events are localized by combining the results of forward-time and backward-time analysis. The signal is at rst transformed...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • SYNTHESIZING MEDICAL TERMS – QUALITY AND NATURALNESS OF THE DEEP TEXT-TO-SPEECH ALGORITHM

    The main purpose of this study is to develop a deep text-to-speech (TTS) algorithm designated for an embedded system device. First, a critical literature review of state-of-the-art speech synthesis deep models is provided. The algorithm implementation covers both hardware and algorithmic solutions. The algorithm is designed for use with the Raspberry Pi 4 board. 80 synthesized sentences were prepared based on medical and everyday...

    Pełny tekst do pobrania w portalu

  • Identity verification using complex representations of handwritten signature

    This paper is devoted to handwritten signature verification using the cross-correlation approach (adopted by the authors from telecommunications) and dynamic time warping. The following invariants of the handwritten signature: the net signature, the instantaneous complex frequency and the complex cepstrum are analyzed. The problem of setting the threshold for deciding whether the current signature is authentic or forged is discussed....

  • Speech Intelligibility Measurements in Auditorium

    Publikacja

    Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

    Pełny tekst do pobrania w portalu

  • Time variable gain for long range sonar with chirp sounding signal

    The main purpose of applaying Time Variable Gain (TVG) in active sonars with digital signal processing is to reduce dynamic range of echo signal and adapt it to the dynamic range of the analogue to digital conversion. With high transmission losses level, the dynamic range of the input signal in long range sonars can be very high and even exceed 200dB. When chirp sounding signals with matched filtration are used, sonars can raech...

  • Performance Analysis of the OpenCL Environment on Mobile Platforms

    Publikacja

    Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

    Publikacja

    - Rok 2019

    The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

    Pełny tekst do pobrania w portalu

  • Silence/noise detection for speech and music signals

    Publikacja

    - Rok 2008

    This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

  • Investigating Feature Spaces for Isolated Word Recognition

    Publikacja
    • P. Treigys
    • G. Korvel
    • G. Tamulevicius
    • J. Bernataviciene
    • B. Kostek

    - Rok 2020

    The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine

    Publikacja

    In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

    Pełny tekst do pobrania w portalu

  • Auditory-visual attention stimulator

    New approach to lateralization irregularities formation was proposed. The emphasis is put on the relationship between visual and auditory attention stimulation. In this approach hearing is stimulated using time scale modified speech and sight is stimulated by rendering the text of the currently heard speech. Moreover, displayed text is modified using several techniques i.e. zooming, highlighting etc. In the experimental part of...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

    A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Comparative analysis of various transformation techniques for voiceless consonants modeling

    Publikacja

    In this paper, a comparison of various transformation techniques, namely Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Discrete Walsh Hadamard Transform (DWHT) are performed in the context of their application to voiceless consonant modeling. Speech features based on these transformation techniques are extracted. These features are mean and derivative values of cepstrum coefficients, derived from each transformation....

    Pełny tekst do pobrania w portalu

  • A study on signal processing methods applied to hearing aids

    Publikacja

    - Rok 2016

    This paper presents a short survey on current technology available in hearing aids with a focus on digital signal processing techniques used. First, factors influencing the hearing aid effectiveness are introduced. Then, examples of the present DSP methods and strategies are provided. Also, a description of current limitations of hearing aids and future trends of development are shown. Finally, the notion of computational auditory...

  • Dynamic coloring of graphs

    Publikacja

    - FUNDAMENTA INFORMATICAE - Rok 2012

    Dynamics is an inherent feature of many real life systems so it is natural to define and investigate the properties of models that reflect their dynamic nature. Dynamic graph colorings can be naturally applied in system modeling, e.g. for scheduling threads of parallel programs, time sharing in wireless networks, session scheduling in high-speed LAN's, channel assignment in WDM optical networks as well as traffic scheduling. In...

  • Distortion of speech signals in the listening area: its mechanism and measurements

    Publikacja

    - Rok 2014

    The paper deals with a problem of the influence of the number and distribution of loudspeakers in speech reinforcement systems on the quality of publicly addressed voice messages, namely on speech intelligibility in the listening area. Linear superposition of time-shifted broadband waves of a same form and slightly different magnitudes that reach a listener from numerous coherent sources, is accompanied by interference effects...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Engineering Challenges in the Design of Cochlear Implants

    Publikacja

    - Rok 2021

    Hearing aids such as cochlear implants have been used by both adults and children for a long time. In addition, cochlear implants are used by patients who have severe hearing loss either by birth or after an accident. This paper aims to investigate the engineering challenges bounding the design of cochlear implants and present its possible solution...

  • POM/EVA Blends with Future Utility in Fused Deposition Modeling

    Publikacja

    - Materials - Rok 2020

    Polyoxymethylene (POM) is one of the most popular thermoplastic polymers used in the industry. Therefore, the interest in its potential applications in rapid prototyping is understandable. Nevertheless, its low dimensional stability causes the warping of 3D prints, limiting its applications. This research aimed to evaluate the effects of POM modification with ethylene-vinyl acetate (EVA) (2.5, 5.0, and 7.5 wt.%) on its processing...

    Pełny tekst do pobrania w portalu

  • Playback detection using machine learning with spectrogram features approach

    Publikacja

    This paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on different data partitions in respect...

    Pełny tekst do pobrania w portalu

  • Speech Analytics Based on Machine Learning

    In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Parallel multithread computing for spectroscopic analysis in optical coherence tomography

    Spectroscopic Optical Coherence Tomography (SOCT) is an extension of Optical Coherence Tomography (OCT). It allows gathering spectroscopic information from individual scattering points inside the sample. It is based on time-frequency analysis of interferometric signals. Such analysis requires calculating hundreds of Fourier transforms while performing a single A-scan. Additionally, further processing of acquired spectroscopic information...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Acceleration of decision making in sound event recognition employing supercomputing cluster

    Parallel processing of audio data streams is introduced to shorten the decision making time in hazardous sound event recognition. A supercomputing cluster environment with a framework dedicated to processing multimedia data streams in real time is used. The sound event recognition algorithms employed are based on detecting foreground events, calculating their features in short time frames, and classifying the events with Support...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Parallelization of video stream algorithms in kaskada platform

    Publikacja

    - Rok 2011

    The purpose of this work is to present different techniques of video stream algorithms parallelization provided by the Kaskada platform - a novel system working in a supercomputer environment designated for multimedia streams processing. Considered parallelization methods include frame-level concurrency, multithreading and pipeline processing. Execution performance was measured on four time-consuming image recognition algorithms,...

  • Improving Control Dynamics of PMSM Drive by Estimating Zero-Delay Current Value

    Dynamic performance of current control loop still remains crucial for position-, speed-, and torque-controlled drives. In the study, a current loop solution has been designed for field oriented control of permanent magnet synchronous motors (PMSM). It enhances typical PI controller with an estimator of zero-delay current (ZDC) value. The ZDC estimation allows for selecting substantially higher controller gain. It reduces control...

    Pełny tekst do pobrania w portalu

  • New approach for determining the QoS of MP3-coded voice signals in IP networks

    Publikacja

    Present-day IP transport platforms being what they are, it will never be possible to rule out conflicts between the available services. The logical consequence of this assertion is the inevitable conclusion that the quality of service (QoS) must always be quantifiable no matter what. This paper focuses on one method to determine QoS. It defines an innovative, simple model that can evaluate the QoS of MP3-coded voice data transported...

    Pełny tekst do pobrania w portalu

  • Estimation of the short-term predictor parameters of speech under noisy conditions

    Publikacja

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Dynamic Bayesian Networks for Symbolic Polyphonic Pitch Modeling

    Publikacja

    Symbolic pitch modeling is a way of incorporating knowledge about relations between pitches into the process of an- alyzing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the “horizontal” and the “vertical” pitch struc- ture. These models are formulated as linear or log-linear interpo- lations of up to fi ve sub-models, each of which is...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation

    In this work we present a new Bayesian topic model: latent hierarchical Pitman-Yor process allocation (LHPYA), which uses hierarchical Pitman-Yor pr ocess priors for both word and topic distributions, and generalizes a few of the existing topic models, including the latent Dirichlet allocation (LDA), the bi- gram topic model and the hierarchical Pitman-Yor topic model. Using such priors allows for integration of -grams with a topic model,...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering

    This paper presents a new approach to elimination of impulsive disturbances from stereo audio recordings. The proposed solution is based on vector autoregressive modeling of audio signals. Online tracking of signal model parameters is performed using the exponential ly weighted least squares algo- rithm. Detection of noise pulses an d model-based interpolation of the irrevocably distorted sampl es is realized using an adaptive, variable-order...

    Pełny tekst do pobrania w portalu

  • Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders

    The purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods for automatic audio mixing first. Then, a novel deep model based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. The model is trained on a custom-prepared database. Mixes created using the...

    Pełny tekst do pobrania w portalu