Search results for: automatic speech recognition - Bridge of Knowledge

Search

Search results for: automatic speech recognition

Filters

total: 1555
filtered: 1142

clear all filters


Chosen catalog filters

  • Category

  • Year

  • Options

clear Chosen catalog filters disabled

Search results for: automatic speech recognition

  • Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing

    Developing signal processing methods to extract information automatically has potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile applications (e.g., tuning apps), or pre-processing for an automatic mixing system. However, the last-mentioned application needs a significant amount of research to reliably recognize real musical instruments in recordings....

    Full text available to download

  • Music Mood Visualization Using Self-Organizing Maps

    Publication

    Due to an increasing amount of music being made available in digital form in the Internet, an automatic organization of music is sought. The paper presents an approach to graphical representation of mood of songs based on Self-Organizing Maps. Parameters describing mood of music are proposed and calculated and then analyzed employing correlation with mood dimensions based on the Multidimensional Scaling. A map is created in which...

    Full text available to download

  • Robot-Based Intervention for Children With Autism Spectrum Disorder: A Systematic Literature Review

    Publication
    • K. D. Bartl-Pokorny
    • P. Uluer
    • D. E. Barkana
    • A. Baird
    • H. Kose
    • T. Zorcec
    • B. Robins
    • B. Schuller
    • A. Landowska
    • M. Pykała

    - IEEE Access - Year 2021

    Children with autism spectrum disorder (ASD) have deficits in the socio-communicative domain and frequently face severe difficulties in the recognition and expression of emotions. Existing literature suggested that children with ASD benefit from robot-based interventions. However, studies varied considerably in participant characteristics, applied robots, and trained skills. Here, we reviewed robot-based interventions targeting...

    Full text available to download

  • Identification of acoustic event of selected noise sources in a long-term environmental monitoring systems

    Publication
    • M. Kłaczyński
    • W. Cioch
    • T. Wszołek
    • W. Wszołek
    • D. Mleczko
    • P. Pawlik
    • A. Grzeczka

    - Year 2014

    ABSTRACT Undertaking long-term acoustic measurements on sites located near an airport is related to a problem of large quantities of recorded data, which very often represents information not related to flight operations. In such areas, usually defined as zone of limited use, often other sources of noise exist, such as roads or railway lines treated is such context as acoustic background. Manual verification of such recorded data...

  • Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC

    Publication

    - Year 2002

    This work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...

    Full text to download in external service

  • Semi complex navigation with an active optical gesture sensor

    This paper presents the methods of diversified touchless interactions between a user and a mobile platform utilizing the optical gesture sensor. The sensor uses 8 photodiodes to measure the reflected light in the active mode (using embedded LEDs) or it measures shadows caused by fingers in the passive mode. Several algorithms were implemented: automatic mode switching, adaptive illumination level compensation, resolution improvements...

    Full text to download in external service

  • In uence of Low-Level Features Extracted from Rhythmic and Harmonic Sections on Music Genre Classi cation

    Publication

    - Year 2013

    We present a comprehensive evaluation of the infuence of 'harmonic' and rhythmic sections contained in an audio file on automatic music genre classi cation. The study is performed using the ISMIS database composed of music files, which are represented by vectors of acoustic parameters describing low-level music features. Non-negative Matrix Factorization serves for blind separation of instrument components. Rhythmic components...

  • Performance Analysis of the OpenCL Environment on Mobile Platforms

    Publication

    Today’s smartphones have more and more features that so far were only assigned to personal computers. Every year these devices are composed of better and more efficient components. Everything indicates that modern smartphones are replacing ordinary computers in various activities. High computing power is required for tasks such as image processing, speech recognition and object detection. This paper analyses the performance of...

    Full text to download in external service

  • Separability Assessment of Selected Types of Vehicle-Associated Noise

    Music Information Retrieval (MIR) area as well as development of speech and environmental information recognition techniques brought various tools in-tended for recognizing low-level features of acoustic signals based on a set of calculated parameters. In this study, the MIRtoolbox MATLAB tool, designed for music parameter extraction, is used to obtain a vector of parameters to check whether they are suitable for separation of...

    Full text to download in external service

  • The Hough transform in the classification process of inland ships

    Publication

    This article presents an analysis of the possibilities of using image processing methods for feature extraction that allows kNN classification based on a ship’s image delivered from an on-water video surveillance system. The subject of the analysis is the Hough transform which enables the detection of straight lines in an image. The recognized straight lines and the information about them serve as features in the classification...

    Full text available to download

  • Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model

    Publication

    - Year 2013

    Tries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., Lempel- Ziv'77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election algorithms to distributed hashing...

  • Detecting Apples in the Wild: Potential for Harvest Quantity Estimation

    Publication
    • A. Janowski
    • R. Kaźmierczak
    • C. Kowalczyk
    • J. Szulwic

    - Sustainability - Year 2021

    Knowing the exact number of fruits and trees helps farmers to make better decisions in their orchard production management. The current practice of crop estimation practice often involves manual counting of fruits (before harvesting), which is an extremely time-consuming and costly process. Additionally, this is not practicable for large orchards. Thanks to the changes that have taken place in recent years in the field of image...

    Full text available to download

  • Audio content analysis in the urban area telemonitoring system

    Publication

    Artykuł przedstawia możliwości rozwinięcie monitoringu miejskiego o automatyczną analizę dźwięku. Przedstawiono metody parametryzacji dźwięku, które możliwe są do zastosowania w takim systemie oraz omówiono aspekty techniczne implementacji. W kolejnej części przedstawiono system decyzyjny oparty na drzewach zastosowany w systemie. System ten rozpoznaje dźwięki niebezpieczne (strzał, rozbita szyba, krzyk) wśród dźwięków zarejestrowanych...

    Full text to download in external service

  • Smart Virtual Bass Synthesis Algorithm Based on Music Genre Classification

    Publication

    The aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) algorithms applied to portable computers. The proposed algorithm employed automatic music genre recognition to determine the optimum parameters for the synthesis of additional frequencies. The synthesis was carried out using the non-linear device (NLD) and phase vocoder (PV) methods depending on the music excerpt genre. Classification of musical...

  • Selection of an artificial pre-training neural network for the classification of inland vessels based on their images

    Publication

    - Zeszyty Naukowe Akademii Morskiej w Szczecinie - Year 2021

    Artificial neural networks (ANN) are the most commonly used algorithms for image classification problems. An image classifier takes an image or video as input and classifies it into one of the possible categories that it was trained to identify. They are applied in various areas such as security, defense, healthcare, biology, forensics, communication, etc. There is no need to create one’s own ANN because there are several pre-trained...

    Full text available to download

  • Video content analysis in the urban area telemonitoring system

    Publication

    The task of constant monitoring of video streams from a large number of cameras and reviewing the recordings in order to find a specified event requires a considerable amount of time and effort from the system operators and it is prone to errors. A solution to this problem is an automatic system for constant analysis of camera images being able to raise an alarm if a predefined event is detected. The chapter presents various aspects...

    Full text to download in external service

  • Computer-assisted pronunciation training—Speech synthesis is almost all you need

    Publication

    - SPEECH COMMUNICATION - Year 2022

    The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation errors with high...

    Full text available to download

  • Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

    Publication

    - IEEE Access - Year 2020

    The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech...

    Full text available to download

  • Visual Features for Endoscopic Bleeding Detection

    Aims: To define a set of high-level visual features of endoscopic bleeding and evaluate their capabilities for potential use in automatic bleeding detection. Study Design: Experimental study. Place and Duration of Study: Department of Computer Architecture, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, between March 2014 and May 2014. Methodology: The features have...

    Full text available to download

  • Using Convolutional Neural Networks for Corneal Arcus Detection Towards Familial Hypercholesterolemia Screening

    Publication

    Familial hypercholesterolemia (FH) is a highly undiagnosed disease. Among FH patients, the onset of premature coronary artery disease is 13 times higher than in the general population. Early diagnosis and treatment is essential to prevent cardiovascular diseases and their complications, and to prolong life. One of the clinical criteria of FH is the occurrence of a corneal arcus (CA) among patients, especially those under 45 years...

    Full text available to download

  • DIAGNOSIS OF MALIGNANT MELANOMA BY NEURAL NETWORK ENSEMBLE-BASED SYSTEM UTILISING HAND-CRAFTED SKIN LESION FEATURES

    Malignant melanomas are the most deadly type of skin cancer but detected early have high chances for successful treatment. In the last twenty years, the interest of automated melanoma recognition detection and classification dynamically increased partially because of public datasets appearing with dermatoscopic images of skin lesions. Automated computer-aided skin cancer detection in dermatoscopic images is a very challenging task...

    Full text available to download

  • Speech Intelligibility Measurements in Auditorium

    Publication

    Speech intelligibility was measured in Auditorium Novum on Technical University of Gdansk (seating capacity 408, volume 3300 m3). Articulation tests were conducted; STI and Early Decay Time EDT coefficients were measured. Negative noise contribution to speech intelligibility was taken into account. Subjective measurements and objective tests reveal high speech intelligibility at most seats in auditorium. Correlation was found between...

    Full text available to download

  • Transient detection for speech coding applications

    Signal quality in speech codecs may be improved by selecting transients from speech signal and encoding them using a suitable method. This paper presents an algorithm for transient detection in speech signal. This algorithm operates in several frequency bands. Transient detection functions are calculated from energy measured in short frames of the signal. The final selection of transient frames is based on results of detection...

    Full text to download in external service

  • Improving the quality of speech in the conditions of noise and interference

    Publication

    The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In...

    Full text to download in external service

  • Constructing a Dataset of Speech Recordingswith Lombard Effect

    Publication

    - Year 2020

    Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

  • Improved method for real-time speech stretching

    Publication

    n algorithm for real-time speech stretching is presented. It was designed to modify input signal dependently on its content and on its relation with the historical input data. The proposed algorithm is a combination of speech signal analysis algorithms, i.e. voice, vowels/consonants, stuttering detection and SOLA (Synchronous-Overlap-and-Add) based speech stretching algorithm. This approach enables stretching input speech signal...

    Full text to download in external service

  • The Impact of Weather on Traffic Speed in Urban Area

    The issue of the impact of weather conditions on trip speed of vehicles has been studied for a long time and it is still the subject of many scientific researches. The impact of atmospheric conditions on the speed with which drivers drive their vehicles seems to be obvious. Good weather conditions, sunny weather with good visibility surely provokes higher speed while rainfall, wind...

    Full text available to download

  • Real-time speech-rate modification experiments

    Publication

    An algorithm designed for real-time speech time scale modification (stretching) is proposed, providing a combination of typical synchronous overlap and add based time scale modification algorithm and signal redundancy detection algorithms that allow to remove parts of the speech signal and replace them with the stretched speech signal fragments. Effectiveness of signal processing algorithms are examined experimentally together...

    Full text to download in external service

  • Improving Objective Speech Quality Indicators in Noise Conditions

    Publication

    - Year 2020

    This work aims at modifying speech signal samples and test them with objective speech quality indicators after mixing the original signals with noise or with an interfering signal. Modifications that are applied to the signal are related to the Lombard speech characteristics, i.e., pitch shifting, utterance duration changes, vocal tract scaling, manipulation of formants. A set of words and sentences in Polish, recorded in silence,...

    Full text to download in external service

  • Speech synthesis controlled by eye gazing

    Publication

    - Year 2010

    A method of communication based on eye gaze controlling is presented. Investigations of using gaze tracking have been carried out in various context applications. The solution proposed in the paper could be referred to as ''talking by eyes'' providing an innovative approach in the domain of speech synthesis. The application proposed is dedicated to disabled people, especially to persons in a so-called locked-in syndrome who cannot...

  • Detecting Lombard Speech Using Deep Learning Approach

    Publication
    • K. Kąkol
    • G. Korvel
    • G. Tamulevicius
    • B. Kostek

    - SENSORS - Year 2023

    Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks...

    Full text available to download

  • Acoustic Sensing Analytics Applied to Speech in Reverberation Conditions

    Publication

    The paper aims to discuss a case study of sensing analytics and technology in acoustics when applied to reverberation conditions. Reverberation is one of the issues that makes speech in indoor spaces challenging to understand. This problem is particularly critical in large spaces with few absorbing or diffusing surfaces. One of the natural remedies to improve speech intelligibility in such conditions may be achieved through speaking...

    Full text available to download

  • Time-domain prosodic modifications for text-to-speech synthesizer

    Publication

    - Year 2010

    An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

  • A Method of Real-Time Non-uniform Speech Stretching

    Publication

    Developed method of real-time non-uniform speech stretching is presented.The proposed solution is based on the well-known SOLA algorithm(Synchronous Overlap and Add). Non-uniform time-scale modification isachieved by the adjustment of time scaling factor values in accordance with thesignal content. Dependently on the speech unit (vowels/consonants), instantaneousrate of speech (ROS), and speech signal presence, values of the scalingfactor...

    Full text to download in external service

  • STANY NIEUSTALONE TOWARZYSZĄCE POMIAROWI IMPEDANCJI PĘTLI ZWARCIA W OBWODACH WYJŚCIOWYCH ZASILACZY BEZPRZERWOWYCH UPS

    W pracy przedstawiono metodykę i wyniki pomiarów stanów nieustalonych w zasilaczu bezprzerwowym (UPS) typu on - line. Do rejestracji zdarzeń po stronie zasilania i na wyjściu UPS wykorzystano dwa przyrządy do pomiaru jakości energii elektrycznej zsynchronizowane czasowo. Rejestratory kompresują uzyskane dane pomiarowe, co może wprowadzać dodatkowe błędy pomiaru wielkości mierzonych – napięć i prądów. Dodatkowe rejestracje oscyloskopem...

    Full text available to download

  • Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

    Publication
    • D. Korzekwa
    • R. Barra-Chicote
    • B. Kostek
    • T. Drugman
    • M. Łajszczak

    - Year 2019

    We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not...

    Full text available to download

  • Comparison of various speech time-scale modificartion methods

    The objective of this work is to investigate the influence of the different time-scale modification (TSM) methods on the quality of the speech stretched up using the designed non-uniform real-time speech time-scale modification algorithm (NU-RTSM). The algorithm provides a combination of the typical TSM algorithm with the vowels, consonants, stutter, transients and silence detectors. Based on the information about the content and...

  • Speech codec enhancements utilizing time compression and perceptual coding

    Publication

    A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband...

  • Tensor Decomposition for Imagined Speech Discrimination in EEG

    Publication

    - LECTURE NOTES IN COMPUTER SCIENCE - Year 2018

    Most of the researches in Electroencephalogram(EEG)-based Brain-Computer Interfaces (BCI) are focused on the use of motor imagery. As an attempt to improve the control of these interfaces, the use of language instead of movement has been recently explored, in the form of imagined speech. This work aims for the discrimination of imagined words in electroencephalogram signals. For this purpose, the analysis of multiple variables...

    Full text to download in external service

  • Methods of Improving Speech Intelligibility for Listeners with Hearing Resolution Deficit

    Methods developed for real-time time scale modification (TSM) of speech signal are presented. They are based onthe non-uniform, speech rate depended SOLA algorithm (Synchronous Overlap and Add). Influence of theproposed method on the intelligibility of speech was investigated for two separate groups of listeners, i.e. hearingimpaired children and elderly listeners. It was shown that for the speech with average rate equal to or...

    Full text available to download

  • Recognition and sensing of anions

    Publication

    Molecular ion recognition is one of the most intensively studied areas of supramolecular technology. The reason for this is the essential role that ions play in many biological as well as industrial processes. On the other hand, however, it has been proved that ions can have a negative impact on human health and the environment. For these reasons, it is extremly important to develop rapid and simple methods allowing the determination...

  • Ranking Speech Features for Their Usage in Singing Emotion Classification

    Publication

    - Year 2020

    This paper aims to retrieve speech descriptors that may be useful for the classification of emotions in singing. For this purpose, Mel Frequency Cepstral Coefficients (MFCC) and selected Low-Level MPEG 7 descriptors were calculated based on the RAVDESS dataset. The database contains recordings of emotional speech and singing of professional actors presenting six different emotions. Employing the algorithm of Feature Selection based...

    Full text available to download

  • Human emotion recognition with biosignals

    Publication

    - Year 2022

    This chapter presents issues in the field of affective computing. Basic preliminary information for the recognition of emotions is given and models of emotions, various ways of evoking emotions, as well as their theoretical foundations are discussed. The particular attention is given to the use of physiological signals in recognizing emotions. This subject is outlined further below by presenting selected biosignals, their relationship...

    Full text to download in external service

  • System Supporting Speech Perception in Special Educational Needs Schoolchildren

    Publication

    - Year 2012

    The system supporting speech perception during the classes is presented in the paper. The system is a combination of portable device, which enables real-time speech stretching, with the workstation designed in order to perform hearing tests. System was designed to help children suffering from Central Auditory Processing Disorders.

    Full text to download in external service

  • Silence/noise detection for speech and music signals

    Publication

    - Year 2008

    This paper introduces a novel off-line algorithm for silence/noise detection in noisy signals. The main concept of the proposed algorithm is to provide noise patterns for further signals processing i.e. noise reduction for speech enhancement. The algorithm is based on frequency domain characteristics of signals. The examples of different types of noisy signals are presented.

  • High quality speech codec employing sines+noise+transients model

    A method of high quality wideband speech signal representation employing sines+transients+noise model is presented. The need for a wideband speech coding approach as well as various methods for analysis and synthesis of sines, residual and transient states of speech signal is discussed. The perceptual criterion is applied in the proposed approach during encoding of sines amplitudes in order to reduce bandwidth requirements and...

    Full text available to download

  • Virtual keyboard controlled by eye gaze employing speech synthesis

    Publication

    - Year 2010

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

  • Virtual Keyboard controlled by eye gaze employing speech synthesis

    The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents...

    Full text to download in external service

  • Analysis of Lombard speech using parameterization and the objective quality indicators in noise conditions

    Publication

    - Year 2018

    The aim of the work is to analyze Lombard speech effect in recordings and then modify the speech signal in order to obtain an increase in the improvement of objective speech quality indicators after mixing the useful signal with noise or with an interfering signal. The modifications made to the signal are based on the characteristics of the Lombard speech, and in particular on the effect of increasing the fundamental frequency...

  • Computer-Aided Diagnosis of COVID-19 from Chest X-ray Images Using Hybrid-Features and Random Forest Classifier

    Publication

    - Healthcare - Year 2023

    In recent years, a lot of attention has been paid to using radiology imaging to automatically find COVID-19. (1) Background: There are now a number of computer-aided diagnostic schemes that help radiologists and doctors perform diagnostic COVID-19 tests quickly, accurately, and consistently. (2) Methods: Using chest X-ray images, this study proposed a cutting-edge scheme for the automatic recognition of COVID-19 and pneumonia....

    Full text available to download