Search results for: speech recognition, allophone, phonology, foreign language, audio features - Bridge of Knowledge

Search

Search results for: speech recognition, allophone, phonology, foreign language, audio features

Search results for: speech recognition, allophone, phonology, foreign language, audio features

  • Acceleration of decision making in sound event recognition employing supercomputing cluster

    Publication

    Parallel processing of audio data streams is introduced to shorten the decision making time in hazardous sound event recognition. A supercomputing cluster environment with a framework dedicated to processing multimedia data streams in real time is used. The sound event recognition algorithms employed are based on detecting foreground events, calculating their features in short time frames, and classifying the events with Support...

    Full text to download in external service

  • Personal adaptive tuning of mobile computer audio

    An integrated methodology for enhancing audio quality in mobile computers is presented. The key features are adaptation of the characteristics of the acoustic track to the changing conditions and to the user's individual preferences. Original signal processing algorithms are introduced, which concern: linearization of frequency response, dialogue intelligibility enhancement and dynamics processing tuned up to the user's preferences....

  • Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling

    Publication

    - Year 2021

    A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result...

    Full text to download in external service

  • Estimation of the short-term predictor parameters of speech under noisy conditions

    Publication

    Full text to download in external service

  • Elimination of Impulsive Disturbances From Archive Audio Signals Using Bidirectional Processing

    In this application-oriented paper we consider the problem of elimination of impulsive disturbances, such as clicks, pops and record scratches, from archive audio recordings. The proposed approach is based on bidirectional processing—noise pulses are localized by combining the results of forward-time and backward-time signal analysis. Based on the results of specially designed empirical tests (rather than on the results of theoretical analysis),...

    Full text available to download

  • Dynamic Bayesian Networks for Symbolic Polyphonic Pitch Modeling

    Publication

    Symbolic pitch modeling is a way of incorporating knowledge about relations between pitches into the process of an- alyzing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the “horizontal” and the “vertical” pitch struc- ture. These models are formulated as linear or log-linear interpo- lations of up to fi ve sub-models, each of which is...

    Full text to download in external service

  • Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering

    This paper presents a new approach to elimination of impulsive disturbances from stereo audio recordings. The proposed solution is based on vector autoregressive modeling of audio signals. Online tracking of signal model parameters is performed using the exponential ly weighted least squares algo- rithm. Detection of noise pulses an d model-based interpolation of the irrevocably distorted sampl es is realized using an adaptive, variable-order...

    Full text available to download

  • Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation

    In this work we present a new Bayesian topic model: latent hierarchical Pitman-Yor process allocation (LHPYA), which uses hierarchical Pitman-Yor pr ocess priors for both word and topic distributions, and generalizes a few of the existing topic models, including the latent Dirichlet allocation (LDA), the bi- gram topic model and the hierarchical Pitman-Yor topic model. Using such priors allows for integration of -grams with a topic model,...

    Full text to download in external service

  • Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

    A method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...

    Full text to download in external service

  • Adaptive Personal Tuning of Sound in Mobile Computers

    An integrated methodology for enhancing audio quality in mobile computers is presented. The key features are adaptation of the characteristics of their acoustic track to changing acoustic conditions of the environment and to users’ individual preferences. Signal processing algorithms are introduced that concern: linearization of frequency response, dialogue intelligibility enhancement, and dynamics processing tuned up to the users’...

    Full text available to download

  • Performance Analysis of the OpenCL Environment on Mobile Platforms

    Publication

    Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

    Full text to download in external service

  • Improving the quality of speech in the conditions of noise and interference

    Publication

    The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

    Full text to download in external service

  • Automatic sound recognition for security purposes

    Publication

    - Year 2008

    In the paper an automatic sound recognition system is presented. It forms a part of a bigger security system developed in order to monitor outdoor places for non-typical audio-visual events. The analyzed audio signal is being recorded from a microphone mounted in an outdoor place thus a non stationary noise of a significant energy is present in it. In the paper an especially designed algorithm for outdoor noise reduction is presented,...

  • In uence of Low-Level Features Extracted from Rhythmic and Harmonic Sections on Music Genre Classi cation

    Publication

    - Year 2013

    We present a comprehensive evaluation of the infuence of 'harmonic' and rhythmic sections contained in an audio file on automatic music genre classi cation. The study is performed using the ISMIS database composed of music files, which are represented by vectors of acoustic parameters describing low-level music features. Non-negative Matrix Factorization serves for blind separation of instrument components. Rhythmic components...

  • Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor

    Spatial filtration of sound is introduced to enhance speech recognition accuracy in noisy conditions. An acoustic vector sensor (AVS) is employed. The signals from the AVS probe are processed in order to attenuate the surrounding noise. As a result the signal to noise ratio is increased. An experiment is featured in which speech signals are disturbed by babble noise. The signals before and after spatial filtration are processed...

    Full text to download in external service

  • Audio-visual aspect of the Lombard effect and comparison with recordings depicting emotional states.

    In this paper an analysis of audio-visual recordings of the Lombard effect is shown. First, audio signal is analyzed indicating the presence of this phenomenon in the recorded sessions. The principal aim, however, was to discuss problems related to extracting differences caused by the Lombard effect, present in the video , i.e. visible as tension and work of facial muscles aligned to an increase in the intensity of the articulated...

    Full text to download in external service

  • Discovering Rule-Based Learning Systems for the Purpose of Music Analysis

    Publication

    Music analysis and processing aims at understanding information retrieved from music (Music Information Retrieval). For the purpose of music data mining, machine learning (ML) methods or statistical approach are employed. Their primary task is recognition of musical instrument sounds, music genre or emotion contained in music, identification of audio, assessment of audio content, etc. In terms of computational approach, music databases...

    Full text available to download

  • WYKORZYSTANIE SIECI NEURONOWYCH DO SYNTEZY MOWY WYRAŻAJĄCEJ EMOCJE

    Publication

    - Year 2018

    W niniejszym artykule przedstawiono analizę rozwiązań do rozpoznawania emocji opartych na mowie i możliwości ich wykorzystania w syntezie mowy z emocjami, wykorzystując do tego celu sieci neuronowe. Przedstawiono aktualne rozwiązania dotyczące rozpoznawania emocji w mowie i metod syntezy mowy za pomocą sieci neuronowych. Obecnie obserwuje się znaczny wzrost zainteresowania i wykorzystania uczenia głębokiego w aplikacjach związanych...

  • Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech

    Publication
    • D. Piotrowski
    • R. Korzeniowski
    • A. Falai
    • S. Cygert
    • K. Pokora
    • G. Tinchev
    • Z. Zhang
    • K. Yanagisawa

    - Year 2023

    In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

    Full text to download in external service

  • Rough Sets Applied to Mood of Music Recognition

    Publication

    - Year 2016

    With the growth of accessible digital music libraries over the past decade, there is a need for research into automated systems for searching, organizing and recommending music. Mood of music is considered as one of the most intuitive criteria for listeners, thus this work is focused on the emotional content of music and its automatic recognition. The research study presented in this work contains an attempt to music emotion recognition...

  • Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine

    Publication

    - Archives of Acoustics - Year 2023

    In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

    Full text available to download

  • Intelligent multimedia solutions supporting special education needs.

    The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

  • Intelligent video and audio applications for learning enhancement

    The role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....

    Full text to download in external service

  • Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning

    Publication
    • K. Kąkol

    - Year 2023

    The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

    Full text available to download

  • Creating a Realible Music Discovery and Recomendation System

    The aim of this paper is to show problems related to creating a reliable music dis-covery system. The SYNAT database that contains audio files is used for the purpose of experiments. The files are divided into 22 classes corresponding to music genres with different cardinality. Of utmost importance for a reliable music recommendation system are the assignment of audio files to their appropriate gen-res and optimum parameterization...

    Full text to download in external service

  • Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

    Publication

    The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

    Full text available to download

  • Building Knowledge for the Purpose of Lip Speech Identification

    Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

    Full text to download in external service

  • Krzysztof Goczyła prof. dr hab. inż.

    Krzysztof Goczyła, full professor of Gdańsk University of Technology, computer scientist, a specialist in software engineering, knowledge engineering and databases. He graduated from the Faculty of Electronics Technical University of Gdansk in 1976 with a degree in electronic engineering, specializing in automation. Since then he has been working at Gdańsk University of Technology. In 1982 he obtained a doctorate in computer science...

  • Constructing a Dataset of Speech Recordingswith Lombard Effect

    Publication

    - Year 2020

    Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

  • Automatic Classification of Polish Sign Language Words

    In the article we present the approach to automatic recognition of hand gestures using eGlove device. We present the research results of the system for detection and classification of static and dynamic words of Polish language. The results indicate the usage of eGlove allows to gain good recognition quality that additionally can be improved using additional data sources such as RGB cameras.

    Full text available to download

  • Voice command recognition using hybrid genetic algorithm

    Publication

    Abstract: Speech recognition is a process of converting the acoustic signal into a set of words, whereas voice command recognition consists in the correct identification of voice commands, usually single words. Voice command recognition systems are widely used in the military, control systems, electronic devices, such as cellular phones, or by people with disabilities (e.g., for controlling a wheelchair or operating a computer...

    Full text available to download

  • Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

    Publication

    - Year 2018

    The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

  • Objective Programming EMSS 2023

    e-Learning Courses
    • E. Lubecka

    Theory and practice on object oriented programmingSoftware programming paradigms including object oriented approachEncapsulation, inheritance, abstraction and polymorphism in C++ languageSpecific features of C++ obiect-orientationDynamic memory management in C++ languagePython as a scripting object oriented languageComparison of C++ and Python languages to Java and C#

  • An Attempt to Create Speech Synthesis Model That Retains Lombard Effect Characteristics

    Publication

    - Year 2019

    The speech with the Lombard effect has been extensively studied in the context of speech recognition or speech enhancement. However, few studies have investigated the Lombard effect in the context of speech synthesis. The aim of this paper is to create a mathematical model that allows for retaining the Lombard effect. These models could be used as a basis of a formant speech synthesizer. The proposed models are based on dividing...

    Full text available to download

  • Individual Resources and Intercultural Interactions

    Publication

    - Year 2017

    The work environment in multinational corporations (MNCs) is specific and demanding including intercultural interactions with co-workers and clients and using a foreign language. Some individual resources can help in dealing with these circumstances. Individual resources refer to personal dispositions, competencies and prior experiences. With regard to previous studies, a caravan of personal resources, namely Psychological Capital...

  • Analiza stanu nawierzchni i klas pojazdów na podstawie parametrów ekstrahowanych z sygnału fonicznego

    Celem badań jest poszukiwanie parametrów wektora cech ekstrahowanego z sygnału fonicznego w kontekście automatycznego rozpoznawania stanu nawierzchni jezdni oraz typu pojazdów. W pierwszej kolejności przedstawiono wpływ warunków pogodowych na charakterystykę widmową sygnału fonicznego rejestrowanego przy przejeżdżających pojazdach. Następnie, dokonano parametryzacji sygnału fonicznego oraz przeprowadzano analizę korelacyjną w celu...

    Full text available to download

  • A Review of Emotion Recognition Methods Based on Data Acquired via Smartphone Sensors

    Publication

    In recent years, emotion recognition algorithms have achieved high efficiency, allowing the development of various affective and affect-aware applications. This advancement has taken place mainly in the environment of personal computers offering the appropriate hardware and sufficient power to process complex data from video, audio, and other channels. However, the increase in computing and communication capabilities of smartphones,...

    Full text available to download

  • From Sequential to Parallel Implementation of NLP Using the Actor Model

    The article focuses on presenting methods allowing easy parallelization of an existing, sequential Natural Language Processing (NLP) application within a multi-core system. The actor-based solution implemented with the Akka framework has been applied and compared to an application based on Task Parallel Library (TPL) and to the original sequential application. Architectures, data and control flows are described along with execution...

    Full text available to download

  • AUTOMATYCZNA KLASYFIKACJA MOWY PATOLOGICZNEJ

    Publication

    Aplikacja przedstawiona w niniejszym rozdziale służy do automatycznego wykrywania mowy patologicznej na podstawie bazy nagrań. W pierwszej kolejności przedstawiono założenia leżące u podstaw przeprowadzonych badan wraz z wyborem bazy mowy patologicznej. Zaprezentowano również zastosowane algorytmy oraz cechy sygnału mowy, które pozwalają odróżnić mowę niezaburzoną od mowy patologicznej. Wytrenowane sieci neuronowe zostały następnie...

    Full text to download in external service

  • Applying the Lombard Effect to Speech-in-Noise Communication

    Publication

    - Electronics - Year 2023

    This study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...

    Full text available to download

  • Noise profiling for speech enhancement employing machine learning models

    Publication

    - Journal of the Acoustical Society of America - Year 2022

    This paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...

    Full text available to download

  • Detection, classification and localization of acoustic events in the presence of background noise for acoustic surveillance of hazardous situations

    Evaluation of sound event detection, classification and localization of hazardous acoustic events in the presence of background noise of different types and changing intensities is presented. The methods for discerning between the events being in focus and the acoustic background are introduced. The classifier, based on a Support Vector Machine algorithm, is described. The set of features and samples used for the training of the...

    Full text available to download

  • Magdalena Szuflita-Żurawska

    Head of the Scientific and Technical Information Services at the Gdansk University of Technology Library and the Leader of the Open Science Competence Center. She is also a Plenipotentiary of the Rector of the Gdańsk University of Technology for open science.  She is a PhD Candidate. Her main areas of research and interests include research productivity, motivation, management of HEs, Open Access, Open Research Data, information...

  • Variable Ratio Sample Rate Conversion Based on Fractional Delay Filter

    Publication

    - Archives of Acoustics - Year 2014

    In this paper a sample rate conversion algorithm which allows for continuously changing resampling ratio has been presented. The proposed implementation is based on a variable fractional delay filter which is implemented by means of a Farrow structure. Coefficients of this structure are computed on the basis of fractional delay filters which are designed using the offset window method. The proposed approach allows us to freely...

    Full text available to download

  • Testing Watermark Robustness against Application of Audio Restoration Algorithms

    Publication

    The purpose of this study was to test to what extent watermarks embedded in distorted audio signals are immune to audio restoration algorithm performing. Several restoration routines such as noise reduction, spectrum expansion, clipping or clicks reduction were applied in the online website system. The online service was extended with some copyright protection mechanisms proposed by the authors. They contain low-level music features...

    Full text to download in external service

  • Using Physiological Signals for Emotion Recognition

    Publication

    - Year 2013

    Recognizing user’s emotions is the promising area of research in a field of human-computer interaction. It is possible to recognize emotions using facial expression, audio signals, body poses, gestures etc. but physiological signals are very useful in this field because they are spontaneous and not controllable. In this paper a problem of using physiological signals for emotion recognition is presented. The kinds of physiological...

    Full text to download in external service

  • PHONEME DISTORTION IN PUBLIC ADDRESS SYSTEMS

    Publication

    - Year 2015

    The quality of voice messages in speech reinforcement and public address systems is often poor. The sound engineering projects of such systems take care of sound intensity and possible reverberation phenomena in public space without, however, considering the influence of acoustic interference related to the number and distribution of loudspeakers. This paper presents the results of measurements and numerical simulations of the...

  • Tensor Decomposition for Imagined Speech Discrimination in EEG

    Publication

    - LECTURE NOTES IN COMPUTER SCIENCE - Year 2018

    Most of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...

    Full text to download in external service

  • Orken Mamyrbayev Professor

    People

    1.  Education: Higher. In 2001, graduated from the Abay Almaty State University (now Abay Kazakh National Pedagogical University), in the specialty: Computer science and computerization manager. 2.  Academic degree: Ph.D. in the specialty "6D070300-Information systems". The dissertation was defended in 2014 on the topic: "Kazakh soileulerin tanudyn kupmodaldy zhuyesin kuru". Under my supervision, 16 masters, 1 dissertation...

  • DEVELOPMENT OF THE ALGORITHM OF POLISH LANGUAGE FILM REVIEWS PREPROCESSING

    The algorithm and the software for conducting the procedure of Preprocessing of the reviews of films in the Polish language were developed. This algorithm contains the following steps: Text Adaptation Procedure; Procedure of Tokenization; Procedure of Transforming Words into the Byte Format; Part-of-Speech Tagging; Stemming / Lemmatization Procedure; Presentation of Documents in the Vector Form (Vector Space Model) Procedure; Forming...

    Full text available to download