Wyniki wyszukiwania dla: TWO-DIMENSIONAL REPRESENTATION OF SPEECH SIGNAL - MOST Wiedzy

Wyszukiwarka

Wyniki wyszukiwania dla: TWO-DIMENSIONAL REPRESENTATION OF SPEECH SIGNAL

Wyniki wyszukiwania dla: TWO-DIMENSIONAL REPRESENTATION OF SPEECH SIGNAL

  • Detecting Lombard Speech Using Deep Learning Approach

    Publikacja
    • K. Kąkol
    • G. Korvel
    • G. Tamulevicius
    • B. Kostek

    - SENSORS - Rok 2023

    Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

    Pełny tekst do pobrania w portalu

  • Investigating Feature Spaces for Isolated Word Recognition

    Publikacja
    • P. Treigys
    • G. Korvel
    • G. Tamulevicius
    • J. Bernataviciene
    • B. Kostek

    - Rok 2020

    The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Investigating Feature Spaces for Isolated Word Recognition

    Publikacja

    - Rok 2018

    Much attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...

  • A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces

    Publikacja
    • G. Tamulevicius
    • G. Korvel
    • A. B. Yayak
    • P. Treigys
    • J. Bernataviciene
    • B. Kostek

    - Electronics - Rok 2020

    In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation...

    Pełny tekst do pobrania w portalu

  • Three-dimensional representation of geographic data in a Web-based GIS

    Publikacja

    Presenting geographic data in a web environment has been a long standing problem. For many years the low performance of web browsers has limited the visualization of spatial data to only two dimensions. More recently, the introduction of open standards for 3D acceleration of web applications sparked the emergence of new methods of presenting three-dimensional data in a web browser without using third-party extensions. However,...

    Pełny tekst do pobrania w portalu

  • Speech codec enhancements utilizing time compression and perceptual coding

    Publikacja

    A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

  • High quality speech codec employing sines+noise+transients model

    A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

    Pełny tekst do pobrania w portalu

  • Speech Analytics Based on Machine Learning

    In this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Visualization of short-term heart period variability with network tools as a method for quantifying autonomic drive

    Publikacja
    • D. Makowiec
    • B. Graff
    • A. Kaczkowska
    • G. Graff
    • D. Wejer
    • J. Wdowczyk-Szulc
    • M. Żarczyńska-Buchowiecka
    • M. Gruchała
    • Z. R. Struzik

    - Rok 2017

    We argue that network methods are successful in detecting nonlinear properties in the dynamics of autonomic nocturnal regulation in short-term variability. Two modes of visualization of networks constructed from RR-increments are proposed. The first is based on the handling of a state space. The state space of RR-increments can be modified by a bin size used to code a signal and by the role of a given vertex as the representation...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • A Quasi-2D MOSFET Model — 2D-to-Quasi-2D Transformation

    A quasi-two-dimensional (quasi-2D) representation of the MOSFET channel is proposed in this work. The representation lays the foundations for a quasi 2D MOSFET model. The quasi 2D model is a result of a 2D into quasi 2D transformation. The basis for the transformation are an analysis of a current density vector field and such phenomena as Gradual Channel Detachment Effect (GCDE), Channel Thickness Modulation Effect (CTME), and...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Playback detection using machine learning with spectrogram features approach

    Publikacja

    This paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on different data partitions in respect...

    Pełny tekst do pobrania w portalu

  • Elimination of clicks from archive speech signals using sparse autoregressive modeling

    Publikacja

    This paper presents a new approach to elimination of impulsivedisturbances from archive speech signals. The proposedsparse autoregressive (SAR) signal representation is given ina factorized form - the model is a cascade of the so-called formantfilter and pitch filter. Such a technique has been widelyused in code-excited linear prediction (CELP) systems, as itguarantees model stability. After detection of noise pulses usinglinear...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Analysis-by-synthesis paradigm evolved into a new concept

    This work aims at showing how the well-known analysis-by-synthesis paradigm has recently been evolved into a new concept. However, in contrast to the original idea stating that the created sound should not fail to pass the foolproof synthesis test, the recent development is a consequence of the need to create new data. Deep learning models are greedy algorithms requiring a vast amount of data that, in addition, should be correctly...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • A quasi-2D small-signal MOSFET model - main results

    Dynamic properties of the MOS transistor under small-signal excitation are determined by kinetic parameters of the carriers injected into the channel, i.e., the low-field mobility, velocity saturation, mobility at the quiescent-point (Q-point), longitudinal electric field in the channel, by dynamic properties of the channel, as well as by an electrical coupling between the perturbed carrier concentration in the channel and the...

  • Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

    Publikacja

    - Rok 2021

    This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

    Pełny tekst do pobrania w portalu

  • Chirp Rate and Instantaneous Frequency Estimation: Application to Recursive Vertical Synchrosqueezing

    Publikacja

    - IEEE SIGNAL PROCESSING LETTERS - Rok 2017

    This letter introduces new chirp rate and instantaneous frequency estimators designed for frequency-modulated signals. These estimators are first investigated from a deterministic point of view, then compared together in terms of statistical efficiency. They are also used to design new recursive versions of the vertically synchrosqueezed short-time Fourier transform, using a previously published method (D. Fourer, F. Auger, and...

    Pełny tekst do pobrania w portalu

  • Modeling emotions for affect-aware applications

    The chapter concerns emotional states representation and modeling for software systems, that deal with human affect. A review of emotion representation models is provided, including discrete, dimensional and componential models. The paper provides also analysis of emotion models used in diverse types of affect-aware applications: games, mood trackers or tutoring systems. The analysis is supported with two design cases. The study...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Signal propagation in electromagnetic media described by fractional-order models

    In this paper, signal propagation is analysed in electromagnetic media described by fractional-order (FO) models (FOMs). Maxwell’s equations with FO constitutive relations are introduced in the time domain. Then, their phasor representation is derived for one-dimensional case of the plane wave propagation. With the use of the Fourier transformation, the algorithm for simulation of the non-monochromatic wave propagation is introduced....

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

    Publikacja
    • D. Korzekwa
    • R. Barra-Chicote
    • B. Kostek
    • T. Drugman
    • M. Łajszczak

    - Rok 2019

    We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

    Pełny tekst do pobrania w portalu

  • Applying the Lombard Effect to Speech-in-Noise Communication

    Publikacja

    - Electronics - Rok 2023

    This study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...

    Pełny tekst do pobrania w portalu

  • GRAPHICAL REPRESENTATION OF MUSIC SET BASED ON MOOD OF MUSIC. GRAFICZNA PREZENTACJA ZBIORU MUZYCZNEGO OPARTA NA ANOTACJI NASTROJU MUZYKI

    Publikacja

    One of the features for music recommendation, which is useful and intuitive for music listen-ers, is “mood”. The paper presents an approach to graphical representation of mood of music pieces. Subjective evaluation based on listening tests is performed for assigning mood labels of 150 pieces of music and placing them on the 2D mood plane. As a result, a map of songs is created, where music excerpts with similar mood are organized...

  • Hidden Tensor Structures

    Publikacja

    - ENTROPY - Rok 2024

    Any single system whose space of states is given by a separable Hilbert space is automatically equipped with infinitely many hidden tensor-like structures. This includes all quantum mechanical systems as well as classical field theories and classical signal analysis. Accordingly, systems as simple as a single one-dimensional harmonic oscillator, an infinite potential well, or a classical finite-amplitude signal of finite duration...

    Pełny tekst do pobrania w portalu

  • A Hopf type theorem for equivariant local maps

    We study otopy classes of equivariant local maps and prove a Hopf type theorem for such maps in the case of a real finite-dimensional orthogonal representation of a compact Lie group.

    Pełny tekst do pobrania w portalu

  • Multidimensional Scaling Analysis Applied to Music Mood Recognition

    Publikacja

    The paper presents two experiments aimed at categorizing mood associated with music. Two parts of a listening test were designed and carried out with a group of students, most of whom where users of online social music services. The initial experiment was designed to evaluate the extent to which a given label describes the mood of the particular music excerpt. The second subjective test was conducted to collect the similarity data...

  • Ranking Speech Features for Their Usage in Singing Emotion Classification

    Publikacja

    This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

    Pełny tekst do pobrania w portalu

  • Application of autoencoder to traffic noise analysis

    The aim of an autoencoder neural network is to transform the input data into a lower-dimensional code and then to reconstruct the output from this code representation. Applications of autoencoders to classifying sound events in the road traffic have not been found in the literature. The presented research aims to determine whether such an unsupervised learning method may be used for deploying classification algorithms applied to...

    Pełny tekst do pobrania w portalu

  • The Hopf type theorem for equivariant gradient local maps

    Publikacja

    We construct a degree-type otopy invariant for equivariant gradient local maps in the case of a real finite-dimensional orthogonal representation of a compact Lie group. We prove that the invariant establishes a bijection between the set of equivariant gradient otopy classes and the direct sum of countably many copies of Z.

    Pełny tekst do pobrania w portalu

  • Geometric analogue of holographic reduced representation

    Publikacja

    - Journal of Mathematical Psychology - Rok 2009

    Holographic reduced representations (HRRs) are distributed representations of cognitive structuresbased on superpositions of convolution-bound n-tuples. Restricting HRRs to n-tuples consisting of 1,one reinterprets the variable binding as a representation of the additive group of binary n-tupleswith addition modulo 2. Since convolutions are not defined for vectors, the HRRs cannot be directlyassociated with geometric structures....

    Pełny tekst do pobrania w portalu

  • Machine Learning Applied to Aspirated and Non-Aspirated Allophone Classification—An Approach Based on Audio "Fingerprinting"

    The purpose of this study is to involve both Convolutional Neural Networks and a typical learning algorithm in the allophone classification process. A list of words including aspirated and non-aspirated allophones pronounced by native and non-native English speakers is recorded and then edited and analyzed. Allophones extracted from English speakers’ recordings are presented in the form of two-dimensional spectrogram images and...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Music Mood Visualization Using Self-Organizing Maps

    Due to an increasing amount of music being made available in digital form in the Internet, an automatic organization of music is sought. The paper presents an approach to graphical representation of mood of songs based on Self-Organizing Maps. Parameters describing mood of music are proposed and calculated and then analyzed employing correlation with mood dimensions based on the Multidimensional Scaling. A map is created in which...

    Pełny tekst do pobrania w portalu

  • Distributed Representations Based on Geometric Algebra: the Continuous Model

    Publikacja

    - Informatica - Rok 2011

    Authors revise the concept of a distributed representation of data as well as two previously developed models: Holographic Reduced Representation (HRR) and Binary Spatter Codes (BSC). A Geometric Analogue (GAc - ''c'' stands for continuous as opposed to its discrete version) of HRR is introduced - it employs role-filler binding based on geometric products. Atomic objects are real-valued vectors in n-dimensional Euclidean space...

    Pełny tekst do pobrania w portalu

  • Metrisability of managing of stream-systemic processes

    Publikacja

    To achieve the planned goal, in order to properly describe the manufacturing system management, six process stream functions were introduced. Non-dimensional flows of these functions in time can be empirically defined during the manufacturing process. They are interpreted as non-dimensional expenses. Maximum values for these functions in properly-managed processes equal one. Also, a global management function was introduced, being...

    Pełny tekst do pobrania w portalu

  • An Analysis of Neural Word Representations for Wikipedia Articles Classification

    Publikacja

    - CYBERNETICS AND SYSTEMS - Rok 2019

    One of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Urban flash flood hazard identification and assessment applying geospatial techniques and hydrodynamic modeling; Erbil city case study, Kurdistan Region of Iraq

    Publikacja

    - Rok 2023

    This dissertation aims to investigate the factors behind flash flooding in Erbil's central district, located in the Kurdistan Region of Iraq, and develop a methodology for assessing flood hazards in the city, despite limited data accessibility. In this thesis, each factor was investigated, including analyzing extreme precipitation events in the last two decades, including their spatial and temporal distribution of rainfall, intensity,...

    Pełny tekst do pobrania w portalu

  • Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

    Publikacja

    - Rok 2007

    In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

  • Pitch estimation of narrowband-filtered speech signal using instantaneous complex frequency

    In this paper we propose a novel method of pitch estimation, based on instantaneous complex frequency (ICF). New iterative algorithm for analysis of ICF of speech signal in presented. Obtained results are compared with commonly used methods to prove its accuracy and connection between ICF and pitch, particularly for narrowband-filtered speech signal.

  • Detection and localization of selected acoustic events in acoustic field for smart surveillance applications

    A method for automatic determination of position of chosen sound events such as speech signals and impulse sounds in 3-dimensional space is presented. The evens are localized in the presence of sound reflections employing acoustic vector sensors. Human voice and impulsive sounds are detected using adaptive detectors based on modified peak-valley difference (PVD) parameter and sound pressure level. Localization based on signals...

    Pełny tekst do pobrania w portalu

  • Detection and localization of selected acoustic events in 3D acoustic field for smart surveillance applications

    A method for automatic determination of position of chosen sound events such as speech signals and impulse sounds in 3-dimensional space is presented. The events are localized in the presence of sound reflections employing acoustic vector sensors. Human voice and impulsive sounds are detected using adaptive detectors based on modified peak-valley difference (PVD) parameter and sound pressure level. Localization based on signals...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Process zone in the Single Cantilever Beam under transverse loading. - Part I: Theoretical analysis

    Publikacja

    - THEORETICAL AND APPLIED FRACTURE MECHANICS - Rok 2011

    Single Cantilever Beam (SCB) specimen loaded with a transverse force parallel to the crack front is proposed for the analysis of crack propagation phenomena under mixed mode conditions. The stress redistribution in the adhesive layer in the vicinity of the crack front so as the beam deformation are estimated using a Timoshenko beam on elastic foundation model. This model emphasizes the Mode II contribution due to flexural beam...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • On Fast Multi-objective Optimization of Antenna Structures Using Pareto Front Triangulation and Inverse Surrogates

    Publikacja

    - Rok 2021

    Design of contemporary antenna systems is a challenging endeavor, where conceptual developments and initial parametric studies, interleaved with topology evolution, are followed by a meticulous adjustment of the structure dimensions. The latter is necessary to boost the antenna performance as much as possible, and often requires handling several and often conflicting objectives, pertinent to both electrical and field properties...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Automatic Emotion Recognition in Children with Autism: A Systematic Literature Review

    Publikacja

    - SENSORS - Rok 2022

    The automatic emotion recognition domain brings new methods and technologies that might be used to enhance therapy of children with autism. The paper aims at the exploration of methods and tools used to recognize emotions in children. It presents a literature review study that was performed using a systematic approach and PRISMA methodology for reporting quantitative and qualitative results. Diverse observation channels and modalities...

    Pełny tekst do pobrania w portalu

  • Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

    A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • On-line Search in Two-Dimensional Environment

    Publikacja

    We consider the following on-line pursuit-evasion problem. A team of mobile agents called searchers starts at an arbitrary node of an unknown network. Their goal is to execute a search strategy that guarantees capturing a fast and invisible intruder regardless of its movements using as few searchers as possible. As a way of modeling two-dimensional shapes, we restrict our attention to networks that are embedded into partial grids:...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Two-dimensional gas chromatography – principles and application in fruits analysis

    Two-dimensional gas chromatography is a rapidly developing analytical technique. One of the major uses of this technique is its use for food analysis. The paper presents the principle of operation and history of this analytical technique. The specification of the two-dimensional gas chromatography technique has been discussed. The principles of separation of ingredients and application of the method, particularly in the analysis...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • On exact two-dimensional kinematics for the branching shells

    Publikacja

    - Rok 2010

    We construct the two-dimensional (2D) kinematics which is work-conjugate to the exact 2D local equilibrium conditions of the non-linear theory of branching shells. It is shown that the compatible shell displacements consist of the translation vector and rotation tensor fields defined on the regular parts of the shell base surface as well as independently on the singular surface curve modelling the shell branching. Several characteristic...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • On the correspondence between two- and three-dimensional Eshelby tensors

    We consider both three-dimensional (3D) and two-dimensional (2D) Eshelby tensors known also as energy–momentum tensors or chemical potential tensors, which are introduced within the nonlinear elasticity and the resultant nonlinear shell theory, respectively. We demonstrate that 2D Eshelby tensor is introduced earlier directly using 2D constitutive equations of nonlinear shells and can be derived also using the throughthe-thickness...

    Pełny tekst do pobrania w portalu

  • Multibeam data processing for 3D object shape reconstruction

    Publikacja

    The technology of hydroacoustic scanning offers an efficient and widely-used source of geospatial information regarding underwater environments, providing measurement data which usually have the structure of irregular groups of points known as point clouds. Since this data model has known disadvantages, a different form of representation based on representing surfaces with simple geometric structures, such as edges and facets,...

    Pełny tekst do pobrania w portalu

  • On-line Search in Two-Dimensional Environment

    We consider the following on-line pursuit-evasion problem. A team of mobile agents called searchers starts at an arbitrary node of an unknown network. Their goal is to execute a search strategy that guarantees capturing a fast and invisible intruder regardless of its movements using as few searchers as possible. We require that the strategy is connected and monotone, that is, at each point of the execution the part of the graph...

    Pełny tekst do pobrania w portalu

  • Difference in Perceived Speech Signal Quality Assessment Among Monolingual and Bilingual Teenage Students

    Publikacja

    - Rok 2021

    The user perceived quality is a mixture of factors, including the background of an individual. The process of auditory perception is discussed in a wide variety of fields, ranging from engineering to medicine. Many studies examine the difference between musicians and non-musicians. Since musical training develops musical hearing and other various auditory capabilities, similar enhancements should be observable in case of bilingual...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine

    Publikacja

    In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

    Pełny tekst do pobrania w portalu